我想找到一个 <span> 标签，它位于包含多个 <span> 标签的 <h1> 标签内

首页课程实战体系课手记专栏慕课教程

我想找到一个 <span> 标签，它位于包含多个 <span> 标签的 <h1> 标签内

我想要做的是选择第二个跨度并抓住它的文本来打印它。下面是 HTML 代码和 BeautifulSoup 代码

#HTML code

<h1 id="productTitle">

<a href="https://www.example.com/product/">

<span id="productBrand">BRAND</span>

</a>

<span>PRODUCT TITLE </span>

</h1>

#BeautifulSoup code

for h1 in soup.find_all('h1', id="productTitle"):

productTitle = h1.find('span').text

print(productTitle)

阿晨1998

浏览 210回答 2

2回答

LEATH

希望，并非总是如此，id 应该是唯一的含义find_all可能不是必需的。使用 bs4 4.7.1+，您可以使用 :not 排除具有 id 的子跨度from bs4 import BeautifulSoup as bshtml = '''<h1 id="productTitle">   <a href="https://www.example.com/product/">         <span id="productBrand">BRAND</span>   </a>         <span>PRODUCT TITLE </span></h1>'''soup = bs(html, 'lxml')print(soup.select_one('#productTitle span:not([id])').text)你也可以第n个孩子print(soup.select_one('#productTitle span:nth-child(2)').text)或者print(soup.select_one('#productTitle span:nth-child(even)').text)甚至是一个直接的兄弟姐妹组合来获得span孩子aprint(soup.select_one('#productTitle a + span').text)或链接 next_siblingprint(soup.select_one('#productTitle a').next_sibling.next_sibling.text)

0 0

素胚勾勒不出你

h1这会在标签中获取您需要的所有字段：蟒蛇代码：from bs4 import BeautifulSouptext = '''<h1 id="productTitle">   <a href="https://www.example.com/product/">         <span id="productBrand">BRAND</span>   </a>         <span>PRODUCT TITLE </span></h1>'''soup = BeautifulSoup(text,features='html.parser')#BeautifulSoup codefor h1 in soup.find_all('h1', id="productTitle"):    spans = h1.find_all('span')    print('productBrand  == > {}'.format(spans[0].text))    print('productTitle  == > {}'.format(spans[1].text))使用 h1 获取所有跨度：for h1 in soup.find_all('h1', id="productTitle"):    for i,span in enumerate(h1.find_all('span')):      print('span {} == > {}'.format(i,span.text))

0 0

随时随地看视频慕课网APP

相关分类

Python