BS4：Google 下一页“仅实现以下伪类：nth-of-type”

虽然能够成功地抓取第一页，但它不允许我执行第二页。请注意，我不想对 Selinum 执行此操作。

import requests

from bs4 import BeautifulSoup

url = 'https://google.com/search?q=In+order+to&hl=en'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

page = 1

while True:

print()

print('Page {}...'.format(page))

print('-' * 80)

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for h in soup.select('h3'):

print(h.get_text(strip=True))

next_link = soup.select_one('a:contains("Next")')

if not next_link:

break

url = 'https://google.com' + next_link['href']

page += 1

结果：

Page 1...

--------------------------------------------------------------------------------

In order to Synonyms, In order to Antonyms | Thesaurus.com

In order to - English Grammar Today - Cambridge Dictionary

in order to - Wiktionary

What is another word for "in order to"? - WordHippo

In Order For (someone or something) To | Definition of In ...

In Order For | Definition of In Order For by Merriam-Webster

In order to definition and meaning | Collins English Dictionary

Using "in order to" in English - English Study Page

IN ORDER (FOR SOMEONE / SOMETHING ) TO DO ...

262 In Order To synonyms - Other Words for In Order To

Searches related to In order to

Only the following pseudo-classes are implemented: nth-of-type.

错误就出在这里：

next_link = soup.select_one('a:contains("Next")')

侃侃无极

浏览 187回答 1

1回答

MMMHUHU

您可以用作lxml解析器而不是html.parser安装它pip install lxmlimport requestsfrom bs4 import BeautifulSoupurl = 'https://google.com/search?q=In+order+to&hl=en'headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}page = 1while True:    print()    print('Page {}...'.format(page))    print('-' * 80)    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')    for h in soup.select('h3'):        print(h.get_text(strip=True))    next_link = soup.select_one('a:contains("Next")')    if not next_link:        break    url = 'https://google.com' + next_link['href']    page += 1

随时随地看视频慕课网APP

BS4：Google 下一页“仅实现以下伪类：nth-​​of-type”

1回答

BS4：Google 下一页“仅实现以下伪类：nth-of-type”