如何在没有任何标识的情况下从代码中选择第二个 div?

我不明白我需要做什么才能使用 bs4 将第二个 div 放入第二个 div。我需要获取带有日期的 div。感谢您的帮助。


这是代码:


<div class="featured-item-meta">

    <div><strong>Published:</strong></div>

    <div>October 14, 2015</div>

    <ul class="creatorList">

        <li>

            <div><strong>Writer:</strong></div>

            <div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite  Bennett</a></div>

        </li>

        <li>

            <div><strong>Cover Artist:</strong></div>

            <div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge  Molina</a></div>

        </li>

    </ul>

</div>


慕容森
浏览 214回答 3
3回答

人到中年有点甜

使用 bs4 4.7.1 + 这很容易。您可以使用:hasand:contains获取具有包含字符串div的子项的父项,然后使用相邻的兄弟组合器获取下一个。strongPublished:divfrom bs4 import BeautifulSouphtml = '''<div class="featured-item-meta">&nbsp; &nbsp; <div><strong>Published:</strong></div>&nbsp; &nbsp; <div>October 14, 2015</div>&nbsp; &nbsp; <ul class="creatorList">&nbsp; &nbsp; &nbsp; &nbsp; <li>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Writer:</strong></div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite&nbsp; Bennett</a></div>&nbsp; &nbsp; &nbsp; &nbsp; </li>&nbsp; &nbsp; &nbsp; &nbsp; <li>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Cover Artist:</strong></div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge&nbsp; Molina</a></div>&nbsp; &nbsp; &nbsp; &nbsp; </li>&nbsp; &nbsp; </ul></div>'''soup = bs(html, 'lxml')print(soup.select_one('div:has(strong:contains("Published:")) + div').text)

慕侠2389804

这是一个解决方法text = '<div class="featured-item-meta">\<div><strong>Published:</strong></div>\<div>October 14, 2015</div>\<ul class="creatorList">\&nbsp; &nbsp; <li>\&nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Writer:</strong></div>\&nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite&nbsp; Bennett</a></div>\&nbsp; &nbsp; </li>\&nbsp; &nbsp; <li>\&nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Cover Artist:</strong></div>\&nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge&nbsp; Molina</a></div>\&nbsp; &nbsp; </li>\</ul>\</div>'soap = BeautifulSoup(text,'html.parser')print(soap.find('div',attrs={'class':'featured-item-meta'})\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .find_all('div')[1].text)输出:October 14, 2015

慕码人2483693

from&nbsp; bs4 import BeautifulSoup as bsps = '''<div class="featured-item-meta">&nbsp; &nbsp; <div><strong>Published:</strong></div>&nbsp; &nbsp; <div>October 14, 2015</div>&nbsp; &nbsp; <ul class="creatorList">&nbsp; &nbsp; &nbsp; &nbsp; <li>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Writer:</strong></div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite&nbsp; Bennett</a></div>&nbsp; &nbsp; &nbsp; &nbsp; </li>&nbsp; &nbsp; &nbsp; &nbsp; <li>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><strong>Cover Artist:</strong></div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge&nbsp; Molina</a></div>&nbsp; &nbsp; &nbsp; &nbsp; </li>&nbsp; &nbsp; </ul></div>'''print(bsp(s).find('div').findChildren('div')[1])代码可能会根据您的完整网页及其结构略有变化。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python