我使用 beautifulsoup 的 Webscraping 代码没有超过第一页

它似乎没有超过第一页。怎么了?此外,如果您要查找的单词在链接中,它不会提供正确的出现,它将显示 5 个输出,其中 5 个作为出现


import requests from bs4 import BeautifulSoup 


for i in range (1,5):


    url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i)

    the_word = 'is' 

    r = requests.get(url, allow_redirects=False)

    soup = BeautifulSoup(r.content, 'lxml')

    words = soup.find(text=lambda text: text and the_word in text) 

    print(words) 

    count =  len(words)

    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))


FFIVE
浏览 212回答 3
3回答

浮云间

如果您想跳过前 6 页,请更改循环中的范围:for i in range (6):   # the first page is addressed at index `0`或者:for i in range (0,6):代替:for i in range (1,5):    # this will start from the second page, since the second page is indexed at `1`

慕莱坞森

尝试:import requestsfrom bs4 import BeautifulSoup for i in range(6):    url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i)    the_word = 'afonja'     r = requests.get(url, allow_redirects=False)    soup = BeautifulSoup(r.content, 'lxml')    words = soup.find(text=lambda text: text and the_word in text)     print(words)    count = 0    if words:        count = len(words)    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))新规格后编辑。假设要计数的单词与 url 中的单词相同,您可以注意到该单词在页面中突出显示,并且span class=highlight在 html 中可被识别。所以你可以使用这个代码:import requestsfrom bs4 import BeautifulSoup for i in range(6):    url = 'https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i)    the_word = 'afonja'     r = requests.get(url, allow_redirects=False)    soup = BeautifulSoup(r.content, 'lxml')    count = len(soup.find_all('span', {'class':'highlight'}))     print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))你得到:Url: https://www.nairaland.com/search/afonja/0/0/0/0contains 30 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/1contains 31 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/2contains 36 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/3contains 30 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/4contains 45 occurrences of word: afonjaUrl: https://www.nairaland.com/search/afonja/0/0/0/5contains 50 occurrences of word: afonja

侃侃无极

顺便说一句,搜索词有自己的类名,所以你可以数一下。以下正确返回页面上未找到的位置。您可以在循环中使用这种方法。import requests from bs4 import BeautifulSoup as bsr = requests.get('https://www.nairaland.com/search?q=afonja&board=0&topicsonly=2')soup = bs(r.content, 'lxml')occurrences = len(soup.select('.highlight'))print(occurrences)import requests from bs4 import BeautifulSoup as bsfor i in range(9):    r = requests.get('https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i))    soup = bs(r.content, 'lxml')    occurrences = len(soup.select('.highlight'))    print(occurrences)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python