使用 Python/C# 的动态 Google 搜索

我想检索 Google 搜索结果计数( 106,000,000 个结果(0.58 秒))。我用 Python 写了这个脚本:


import requests, webbrowser

from bs4 import BeautifulSoup


user_input = input("Type in query: ")

print("Googling..")

link = "http://www.google.com/search?q=" + user_input

google_search = requests.get(link)

print(google_search.headers)


#print it out as file


with open("Output.html", "w") as text_file:

    print("{}".format(google_search.text), file=text_file)

但是当我查看文件时,结果统计信息就丢失了。除了 Google Search API 之外,还有什么方法可以做到这一点,这很糟糕,因为它是有限的,甚至无法获得正确的结果。我写过 Python 和 C#,因为我两者都懂。


慕容3067478
浏览 110回答 2
2回答

慕姐8265434

要从 Google 获得正确的结果,您必须设置正确的User-Agenthttp 标头:import requestsfrom bs4 import BeautifulSoupuser_input = input("Type in query: ")print("Googling for keyword={}..".format(user_input))params = {&nbsp; &nbsp; 'q': user_input,&nbsp; &nbsp; 'hl': 'en'&nbsp; &nbsp;# <-- set hl=en to obtain english only results.}headers = {&nbsp; &nbsp; 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}google_search = requests.get("https://www.google.com/search", params=params, headers=headers)soup = BeautifulSoup(google_search.content, 'html.parser')print(soup.select_one('#result-stats').text)打印(例如):Type in query: moonGoogling for keyword=moon..About 1,720,000,000 results (0.99 seconds)&nbsp;

DIEA

查看SelectorGadgetCSS&nbsp;Chrome 扩展程序,通过在浏览器中单击所需的元素来获取选择器。css或者,如果您不喜欢通过命令在开发工具控制台中进行选择器,则可以使用它来测试选择器$$('SELECTOR')。使用css选择器更灵活,更易读,尝试使用select_one()orselect()&nbsp;bs4方法而不是find()/&nbsp;findAll()。CSS选择器参考。params此外,您可以像这样传递 URL 查询:params = {&nbsp; 'q': 'the most amazing query in 2021',&nbsp; 'gl': 'hl',}requests.get(YOUR_URL, params=params)代码:from bs4 import BeautifulSoupimport requests, lxmlheaders = {&nbsp; &nbsp; 'User-agent':&nbsp; &nbsp; "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"}user_input = input("Type in query: ")print(f"Googling... {user_input}")params = {&nbsp; 'q': user_input,&nbsp; 'gl': 'hl',}soup = BeautifulSoup(requests.get('https://www.google.com/search', headers=headers, params=params).text, 'lxml')print(f"Found {soup.select_one('#result-stats').text}"&nbsp; &nbsp; &nbsp; .replace("About", "about")&nbsp; &nbsp; &nbsp; .replace(" (", " in ")&nbsp; &nbsp; &nbsp; .replace(")", ""))---------'''Type in query: fus ro dahGoogling... fus ro dahFound about 628,000 results in 0.36 seconds&nbsp;'''或者,您可以使用来自 SerpApi 的Google Organic Results API来实现相同的目的。这是一个带有免费计划的付费 API。您的特定示例的主要区别在于您不需要弄清楚为什么某些事情没有按预期工作,因为它已经为最终用户完成了。在这种情况下唯一应该做的就是从结构化的 JSON 字符串中获取所需的数据。集成代码:from serpapi import GoogleSearchimport osuser_input = input("Type in query: ")print(f"Googling... {user_input}")params = {&nbsp; "api_key": os.getenv("API_KEY"),&nbsp; "engine": "google",&nbsp; "q": user_input,&nbsp; "hl": "en"}search = GoogleSearch(params)results = search.get_dict()print(f"Total results: {results['search_information']['total_results']}\n"&nbsp; &nbsp; &nbsp; f"Time taken: {results['search_information']['time_taken_displayed']}")-------'''Type in query: fus ro dahGoogling... fus ro dahTotal results: 663000Time took: 0.59 sec'''免责声明,我为 SerpApi 工作。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python