我有我在谷歌新闻中搜索的请求列表
输出在一个列表中给我所有与此新闻的链接
rqsts_catdogtiger = ['Cat' , 'Dog', 'Tiger']
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
page=0 #first page of google news (10 first news)
url_list = []
for term in rqsts_catdogtiger[0:3]:
url = 'https://www.google.com/search?q={}&tbm=nws&start={}'.format(term,page) #url of request
print(url)
url_list.append(url)
soups = []
for link in url_list:
response = requests.get(link, headers=headers,verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
soups.append(soup)
def find_links():
for soup in soups:
results = soup.findAll("div", {'class': 'g'}) #class of google news
for result in results:
result_link = result.find('a').get('href') #getting links
yield result_link
list_of_links = list(find_links())
list_of_links
输出看起来像30个链接的列表:10个,10个,10个CatDogTiger
我如何将此结果组合成这样:pd.DataFrame
Request Name Links
0 Cat 'https://www.polygon.com/2020/3/19/21187025/cats-2019-tom-hooper-mr-mistoffelees-broadway-musical',...
1 Dog 'https://nypost.com/2020/03/19/second-dog-in-hong-kong-tests-positive-for-coronavirus/',...
2 Tiger 'https://tvrain.ru/teleshow/doma_pogovorim/tiger_cave-504935/',...
潇湘沐
相关分类