从 HTML 标签 Python/BeautifulSoup 获取第二个元素

我想从页面中抓取元素,例如页面 - https://www.aacr.org/?s=breast+cancer&search_type=global


标题的 html 标签包含一个 html 链接和附加的标题。当我运行代码时,它会打印 HTML(第一个位置),然后打印标题(第二个位置/我想要的内容)


例如 - 打印返回 -> <a href="https://www.aacr.org/ Patients-caregivers/cancer/breast-cancer/" title="Breast Cancer ">乳腺癌,


我只想要粗体/第二个元素,有什么帮助吗?这是我的代码 -


import requests

import time

from bs4 import BeautifulSoup

import pandas as pd


productlinks = []

sam=[]

for x in range(1,3):

    url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global'

    r=requests.get(url)

    soup=BeautifulSoup(r.content,'html.parser')

    productlist=soup.find_all('div',class_='blog-content')

    for item in productlist:

        title=soup.find_all('h3')

        print(title)


幕布斯7119047
浏览 188回答 2
2回答

阿晨1998

你必须再做一次迭代才能得到你想要的,通过迭代每个 a 标签(我保持你的代码完整并添加了额外的循环,这样你就可以看到一般如何做到这一点的具体细节,而不仅仅是对于这个特定的用例)。import requestsimport timefrom bs4 import BeautifulSoupimport pandas as pdproductlinks = []sam=[]for x in range(1,3):&nbsp; &nbsp; url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global'&nbsp; &nbsp; r=requests.get(url)&nbsp; &nbsp; soup=BeautifulSoup(r.content,'html.parser')&nbsp; &nbsp; productlist=soup.find_all('div',class_='blog-content')&nbsp; &nbsp; for item in productlist:&nbsp; &nbsp; &nbsp; &nbsp; title=soup.find_all('h3')&nbsp; &nbsp; &nbsp; &nbsp; for single in title:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print(single.a['title'])结果:Breast CancerMale Breast CancerBreast Cancer Prevention (PDQ®)Breast Cancer Screening (PDQ®)Breast Cancer Treatment During Pregnancy (PDQ®)Breast Cancer Treatment (PDQ®)Male Breast Cancer Treatment (PDQ®)Carcinoma of Unknown PrimaryOvercoming Triple-Negative Breast CancerLiving with Metastatic Breast CancerSurviving Metastatic Breast Cancer; Advocating for Other Cancer PatientsLiving With Stage 4 Breast CancerChoosing to Enjoy Life Despite Metastatic Breast CancerA Breast and Colon Cancer Survivor Supports Cancer ResearchPedaling for Cancer ResearchEmily GarnettSupporting Increased Funding for Clinical TrialsRaising Awareness of Male Breast CancerKeeping Breast Cancer at Bay with ImmunotherapyRecovering after Breast Cancer Treatment Thanks to Prehab and RehabTakae Brewer, MDThankful for Clinical TrialsBianca Lundien KennedyGina FavorsRunning to Beat Leukemia (and All Cancers)Patricia FoxSurvivor Profile: An Unlikely PivotProgramAdvances in Breast Cancer ResearchProgramBreast CancerMale Breast CancerBreast Cancer Prevention (PDQ®)Breast Cancer Screening (PDQ®)Breast Cancer Treatment During Pregnancy (PDQ®)Breast Cancer Treatment (PDQ®)Male Breast Cancer Treatment (PDQ®)Carcinoma of Unknown PrimaryOvercoming Triple-Negative Breast CancerLiving with Metastatic Breast CancerSurviving Metastatic Breast Cancer; Advocating for Other Cancer PatientsLiving With Stage 4 Breast CancerChoosing to Enjoy Life Despite Metastatic Breast CancerA Breast and Colon Cancer Survivor Supports Cancer ResearchPedaling for Cancer ResearchEmily Garnett

三国纷争

要获取该title属性,只需将最后一个更改for loop为:for item in productlist:&nbsp; &nbsp; a_tag =item.find('a')&nbsp; &nbsp; print(a_tag['title'])输出:Breast CancerMale Breast CancerBreast Cancer Prevention (PDQ®)Breast Cancer Screening (PDQ®)Breast Cancer Treatment During Pregnancy (PDQ®)Breast Cancer Treatment (PDQ®)Male Breast Cancer Treatment (PDQ®)Carcinoma of Unknown PrimaryOvercoming Triple-Negative Breast CancerLiving with Metastatic Breast CancerSurviving Metastatic Breast Cancer; Advocating for Other Cancer PatientsLiving With Stage 4 Breast CancerChoosing to Enjoy Life Despite Metastatic Breast CancerA Breast and Colon Cancer Survivor Supports Cancer ResearchPedaling for Cancer ResearchEmily GarnettSupporting Increased Funding for Clinical TrialsRaising Awareness of Male Breast CancerKeeping Breast Cancer at Bay with ImmunotherapyRecovering after Breast Cancer Treatment Thanks to Prehab and RehabTakae Brewer, MDThankful for Clinical TrialsBianca Lundien KennedyGina FavorsRunning to Beat Leukemia (and All Cancers)Patricia FoxSurvivor Profile: An Unlikely PivotProgramAdvances in Breast Cancer ResearchProgram
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python