我使用 beautifulsoup 的 Webscraping 代码没有超过第一页

3回答

浮云间

如果您想跳过前 6 页，请更改循环中的范围：for i in range (6):   # the first page is addressed at index `0`或者：for i in range (0,6):代替：for i in range (1,5):    # this will start from the second page, since the second page is indexed at `1`

0 0

慕莱坞森

尝试：import requestsfrom bs4 import BeautifulSoup for i in range(6):    url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i)    the_word = 'afonja'     r = requests.get(url, allow_redirects=False)    soup = BeautifulSoup(r.content, 'lxml')    words = soup.find(text=lambda text: text and the_word in text)     print(words)    count = 0    if words:        count = len(words)    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))新规格后编辑。假设要计数的单词与 url 中的单词相同，您可以注意到该单词在页面中突出显示，并且span class=highlight在 html 中可被识别。所以你可以使用这个代码：import requestsfrom bs4 import BeautifulSoup for i in range(6):    url = 'https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i)    the_word = 'afonja'     r = requests.get(url, allow_redirects=False)    soup = BeautifulSoup(r.content, 'lxml')    count = len(soup.find_all('span', {'class':'highlight'}))     print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))你得到：Url: https://www.nairaland.com/search/afonja/0/0/0/0contains 30 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/1contains 31 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/2contains 36 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/3contains 30 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/4contains 45 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/5contains 50 occurrences of word: afonja

0 0

侃侃无极

顺便说一句，搜索词有自己的类名，所以你可以数一下。以下正确返回页面上未找到的位置。您可以在循环中使用这种方法。import requests from bs4 import BeautifulSoup as bsr = requests.get('https://www.nairaland.com/search?q=afonja&board=0&topicsonly=2')soup = bs(r.content, 'lxml')occurrences = len(soup.select('.highlight'))print(occurrences)import requests from bs4 import BeautifulSoup as bsfor i in range(9):    r = requests.get('https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i))    soup = bs(r.content, 'lxml')    occurrences = len(soup.select('.highlight'))    print(occurrences)

0 0