仅使用Python BS4追加/查找具有属性或包含特定字符串的文本的元素的最佳实践

当前 discord.py (async.io) 代码,用于打印随机纽约时报文章的链接。


@client.command()

async def news(ctx):

    url = 'https://www.nytimes.com/section/us'

    r = requests.get(url)

    soup = BeautifulSoup(r.content)

    articles = soup.find_all('a')

    newslist = []

    for article in articles:

        newslist.append(article['href'])

    news = random.choice(newslist)

    while "2020" not in news:

        news = random.choice(newslist)

    else:

        await ctx.send('https://www.nytimes.com' + news)

由于页面中的每个链接都被附加到新闻列表中,因此在打印/等待ctx.send之前,正在检查“2020”,这意味着它是一篇文章。有没有办法做一个find_all('a')只找到包含“2020”的链接,或者只附加包含“2020”的链接?这些方法会更有效吗?


尚方宝剑之说
浏览 143回答 1
1回答

慕容3067478

import requestsfrom bs4 import BeautifulSoupimport rer = requests.get("https://www.nytimes.com/section/us")soup = BeautifulSoup(r.content, 'html.parser')urls = []for item in soup.findAll("a", href=re.compile("2020")):    item = item.get("href")    if not item.startswith("http"):        item = f"https://www.nytimes.com{item}"    else:        pass    if item not in urls:        urls.append(item)        print(item)输出:https://www.nytimes.com/2020/03/18/us/coronavirus-immigrants.htmlhttps://www.nytimes.com/2020/03/18/us/coronavirus-nebraska-biocontainment.htmlhttps://www.nytimes.com/2020/03/18/us/coronavirus-janitors-cleaners.htmlhttps://www.nytimes.com/2020/03/18/us/small-business-coronavirus-charlotte.htmlhttps://www.nytimes.com/2020/03/19/us/politics/coronavirus-heaven-frilot-mark-frilot.htmlhttps://www.nytimes.com/2020/03/19/us/politics/coronavirus-state-department-travel.htmlhttps://www.nytimes.com/2020/03/19/us/politics/coronavirus-congress-voting.htmlhttps://www.nytimes.com/2020/03/19/us/coronavirus-foster-pets.htmlhttps://www.nytimes.com/2020/03/19/books/molly-brodak-dies.htmlhttps://www.nytimes.com/2020/03/19/us/politics/joe-biden-vice-president.htmlhttps://www.nytimes.com/2020/03/19/us/coronavirus-location-tracking.htmlhttps://www.nytimes.com/2020/03/19/climate/us-flood-season-forescast.htmlhttps://www.nytimes.com/2020/03/19/health/coronavirus-masks-shortage.htmlhttps://www.nytimes.com/2020/03/19/health/coronavirus-travel-ban.htmlhttps://www.nytimes.com/2020/03/19/arts/mal-sharpe-dead.htmlhttps://www.nytimes.com/2020/03/19/business/coronavirus-unemployment-states.htmlhttps://www.nytimes.com/2020/03/19/us/work-from-home-mothers-coronavirus-covid19.htmlhttps://www.nytimes.com/2020/03/19/us/politics/1000-checks-coronavirus-stimulus.htmlhttps://www.nytimes.com/news-event/2020-election
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python