我想找到一个 <span> 标签,它位于包含多个 <span> 标签的 <h1> 标签内

我想要做的是选择第二个跨度并抓住它的文本来打印它。下面是 HTML 代码和 BeautifulSoup 代码


#HTML code


<h1 id="productTitle">

   <a href="https://www.example.com/product/">

       <span id="productBrand">BRAND</span>

   </a>

   <span>PRODUCT TITLE </span>

</h1>

#BeautifulSoup code


for h1 in soup.find_all('h1', id="productTitle"):

    productTitle = h1.find('span').text

    print(productTitle)


阿晨1998
浏览 180回答 2
2回答

LEATH

希望,并非总是如此,id 应该是唯一的含义find_all可能不是必需的。使用 bs4 4.7.1+,您可以使用 :not 排除具有 id 的子跨度from bs4 import BeautifulSoup as bshtml = '''<h1 id="productTitle">&nbsp; &nbsp;<a href="https://www.example.com/product/">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span id="productBrand">BRAND</span>&nbsp; &nbsp;</a>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span>PRODUCT TITLE </span></h1>'''soup = bs(html, 'lxml')print(soup.select_one('#productTitle span:not([id])').text)你也可以第n个孩子print(soup.select_one('#productTitle span:nth-child(2)').text)或者print(soup.select_one('#productTitle span:nth-child(even)').text)甚至是一个直接的兄弟姐妹组合来获得span孩子aprint(soup.select_one('#productTitle a + span').text)或链接 next_siblingprint(soup.select_one('#productTitle a').next_sibling.next_sibling.text)

素胚勾勒不出你

h1这会在标签中获取您需要的所有字段:蟒蛇代码:from bs4 import BeautifulSouptext = '''<h1 id="productTitle">&nbsp; &nbsp;<a href="https://www.example.com/product/">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span id="productBrand">BRAND</span>&nbsp; &nbsp;</a>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span>PRODUCT TITLE </span></h1>'''soup = BeautifulSoup(text,features='html.parser')#BeautifulSoup codefor h1 in soup.find_all('h1', id="productTitle"):&nbsp; &nbsp; spans = h1.find_all('span')&nbsp; &nbsp; print('productBrand&nbsp; == > {}'.format(spans[0].text))&nbsp; &nbsp; print('productTitle&nbsp; == > {}'.format(spans[1].text))使用 h1 获取所有跨度:for h1 in soup.find_all('h1', id="productTitle"):&nbsp; &nbsp; for i,span in enumerate(h1.find_all('span')):&nbsp; &nbsp; &nbsp; print('span {} == > {}'.format(i,span.text))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python