使用 Python 中的 BeautifulSoup 从 Google 搜索中检索链接

我正在使用 Tweepy 和 BeautifulSoup4 构建 Twitter 机器人。我想将请求的结果保存在列表中,但我的脚本不再工作(但几天前就可以工作)。我一直在看,但我不明白。这是我的功能:


import requests

import tweepy

from bs4 import BeautifulSoup

import urllib

import os

from tweepy import StreamListener

from TwitterEngine import TwitterEngine

from ConfigEngine import TwitterAPIConfig

import urllib.request

import emoji

import random


# desktop user-agent

USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

# mobile user-agent

MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"





# Récupération des liens

def parseLinks(url):

    headers = {"user-agent": USER_AGENT}

    resp = requests.get(url, headers=headers)

    if resp.status_code == 200:

        soup = BeautifulSoup(resp.content, "html.parser")

        results = []

        for g in soup.find_all('div', class_='r'):

            anchors = g.find_all('a')

            if anchors:

                link = anchors[0]['href']

                results.append(link)

        return results

代码其余部分中的“url”参数 100% 正确。作为输出,我得到“无”。更准确地说,执行在“results = []”行之后立即停止(因此它不会进入 for)。


任何想法?提前非常感谢!


翻阅古今
浏览 107回答 1
1回答

梦里花落0921

Google 似乎更改了页面上的 HTML 标记。尝试将搜索从更改class="r"为class="rc":import requestsfrom bs4 import BeautifulSoupUSER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"def parseLinks(url):&nbsp; &nbsp; headers = {"user-agent": USER_AGENT}&nbsp; &nbsp; resp = requests.get(url, headers=headers)&nbsp; &nbsp; if resp.status_code == 200:&nbsp; &nbsp; &nbsp; &nbsp; soup = BeautifulSoup(resp.content, "html.parser")&nbsp; &nbsp; &nbsp; &nbsp; results = []&nbsp; &nbsp; &nbsp; &nbsp; for g in soup.find_all('div', class_='rc'): # <-- change 'r' to 'rc'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; anchors = g.find_all('a')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if anchors:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; link = anchors[0]['href']&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; results.append(link)&nbsp; &nbsp; &nbsp; &nbsp; return resultsurl = 'https://www.google.com/search?q=tree'print(parseLinks(url))印刷:['https://en.wikipedia.org/wiki/Tree', 'https://simple.wikipedia.org/wiki/Tree', 'https://www.britannica.com/plant/tree', 'https://www.treepeople.org/tree-benefits', 'https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg', 'https://teamtrees.org/', 'https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/', 'https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python