BeautifulSoup 解析 html 列表

我是解析新手..我有一个简单的 html,没有类属性列表,例如:


    <h2><a href="..">Title 1</a></h2>

    <ol>

        <li>Line 1..</li>

        <li>Line 2...</li>

        ...

    </ol>

    <h2><a href="..">Title 2</a></h2>

    <ol>

        <li>Line 2-1..</li>

        <li>Line 2-2...</li>

        ...

    </ol>

...

等等..


我运行这段代码:


import requests

from bs4 import BeautifulSoup as BS


r = requests.get('http://...')

html = BS(r.content, 'html.parser')


H2 = html.find_all('h2')

for h2 in H2:

    title = h2.text

    print(title)

获取标题..但是我如何<ol>在同一循环中获取分配给该标题的列表?


慕田峪9158850
浏览 78回答 2
2回答

largeQ

一个简单的方法是使用zip。尝试:import requestsfrom bs4 import BeautifulSoup as BSsource = '''<h2><a href="..">Title 1</a></h2>&nbsp; &nbsp; <ol>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 1..</li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2...</li>&nbsp; &nbsp; </ol>&nbsp; &nbsp; <h2><a href="..">Title 2</a></h2>&nbsp; &nbsp; <ol>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2-1..</li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2-2...</li>&nbsp; &nbsp; </ol>'''html = BS(source, 'html.parser')for title, element in zip(html.find_all('h2'), html.find_all('ol')):&nbsp; &nbsp; print(title.text, element.text)结果:Title 1&nbsp;Line 1..Line 2...Title 2&nbsp;Line 2-1..Line 2-2...注意:如果数量不同,可以用itertools.zip_longest代替zip。

HUX布斯

另一个解决方案:您可以使用.find_previous:from bs4 import BeautifulSouptxt = '''&nbsp; &nbsp; <h2><a href="..">Title 1</a></h2>&nbsp; &nbsp; <ol>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 1</li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2</li>&nbsp; &nbsp; &nbsp; &nbsp; ...&nbsp; &nbsp; </ol>&nbsp; &nbsp; <h2><a href="..">Title 2</a></h2>&nbsp; &nbsp; <ol>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2-1</li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Line 2-2</li>&nbsp; &nbsp; &nbsp; &nbsp; ...&nbsp; &nbsp; </ol>'''soup = BeautifulSoup(txt, 'html.parser')out = {}for li in soup.select('ol li'):&nbsp; &nbsp; out.setdefault(li.find_previous('h2').text, []).append(li.text)print(out)印刷:{'Title 1': ['Line 1', 'Line 2'],&nbsp;&nbsp;'Title 2': ['Line 2-1', 'Line 2-2']}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python