Python Web Scraping - 只查找 n 个项目

您的代码中有一个双循环。要退出这两个循环，您需要使用break两次，每个循环一次。您可以在两个循环中的相同条件下中断。试试这个代码：import reimport bs4,requestskeyword_list = ['health','Coronavirus','travel']articles_list = []base_url = 'https://news.google.com/search?q=TEST%20when%3A3d&hl=en-US&gl=US&ceid=US%3Aen' request = requests.get(base_url) webcontent = bs4.BeautifulSoup(request.content,'lxml') maxcnt = 5  # max number of articles     for ictr,i in enumerate(webcontent.findAll('div',{'jslog':'93789'})):   if len(articles_list) == maxcnt: break   # exit outer loop   for link in i.findAll('a', attrs={'href': re.compile("/articles/")},limit=1):        if any(keyword in i.select_one('h3').getText() for keyword in keyword_list):            articles_list.append((i.select_one('h3').getText(),"https://news.google.com"+str(link.get('href'))))            if len(articles_list) == maxcnt: break  # exit inner loopprint(str(len(articles_list)), 'articles')print('\n'.join(['> '+a[0] for a in articles_list]))  # article titles输出5 articles> Why Coronavirus Tests Come With Surprise Bills> It’s Not Easy to Get a Coronavirus Test for a Child> Britain’s health secretary says the asymptomatic don’t need tests. Critics say that sends a mixed message.> Coronavirus testing shifts focus from precision to rapidity> Coronavirus testing at Boston lab suspended after nearly 400 false positives

Python Web Scraping - 只查找 n 个项目

1回答