我想从多个网页中抓取所有 URL。它有效,但只有最后一个网页的结果保存在文件中。
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import requests
urls=['https://www.metacritic.com/browse/movies/genre/date?page=0', 'https://www.metacritic.com/browse/movies/genre/date?page=2', '...']
for url in urls:
response = requests.get(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
soup = BeautifulSoup(html_page, features="html.parser")
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^/movie/([a-zA-Z0-9\-])+$")}):
links.append(link.get('href'))
filename = 'output.csv'
with open(filename, mode="w") as outfile:
for s in links:
outfile.write("%s\n" % s)
我在这里缺少什么?
如果我可以使用包含所有网址而不是列表的 csv 文件,那就更酷了。但是我尝试过的任何事情都离我很远......
开心每一天1111
相关分类