Python我如何在BeautifulSoup中提取具有相同类名的数据

我正在尝试使用 Python 中的 BeautifulSoup 库提取数据。我用拉链和汤来提取。


我的 html 数据如下所示:


<li>


    <ul class="features">


        <li>Year: <strong>2016</strong></li>


        <li>Kilometers: <strong>81,000</strong></li>


    </ul>

    <ul class="features">


        <li>Doors: <strong>2 door</strong></li>


        <li>Color: <strong>White</strong></li>


    </ul>

    <ul class="features">


    </ul>


</li>


在这里,我想在单独的变量中获得年份、公里、门、颜色。但是当我运行我的代码时,它会聚在一起。


我的代码:



for title, price, date, features  in zip(soup.select('.listing-item .title'),

                            soup.select('.listing-item .price'),

                            soup.select('.listing-item .date'),

                            soup.select('.listing-item .features')):



    title = title.get_text().strip()

    price = price.get_text().strip()

    date = date.get_text().strip()

    features = features.get_text().strip()


    print(features)



输出 :


Year: 2016

Kilometers: 81,000

Doors: 2 door

Color: White


我如何将年份、公里、门、颜色存储在单独的变量中?


慕哥6287543
浏览 84回答 2
2回答

翻过高山走不出你

你可以试试:from bs4 import BeautifulSoup as bsfrom io import StringIOdata = """<li>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; &nbsp; &nbsp; <li>Year: <strong>2016</strong></li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Kilometers: <strong>81,000</strong></li>&nbsp; &nbsp; </ul>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; &nbsp; &nbsp; <li>Doors: <strong>2 door</strong></li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Color: <strong>White</strong></li>&nbsp; &nbsp; </ul>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; </ul></li>"""soup = bs(StringIO(data))Year, Km, Doors, Color = list(map(lambda x: x.text.split(':')[1].strip(), soup.select('.features > li')))print(Year, Km, Doors, Color)

心有法竹

找到包含文本的元素li,然后找到下一个强标签。声明空列表并追加。代码。from bs4 import BeautifulSouphtml='''<li>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; &nbsp; &nbsp; <li>Year: <strong>2016</strong></li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Kilometers: <strong>81,000</strong></li>&nbsp; &nbsp; </ul>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; &nbsp; &nbsp; <li>Doors: <strong>2 door</strong></li>&nbsp; &nbsp; &nbsp; &nbsp; <li>Color: <strong>White</strong></li>&nbsp; &nbsp; </ul>&nbsp; &nbsp; <ul class="features">&nbsp; &nbsp; </ul></li>'''soup=BeautifulSoup(html,'html.parser')Year=[]KiloMeter=[]Doors=[]Color=[]for year,km,dor,colr in zip(soup.select('ul.features li:contains("Year:")'),soup.select('ul.features li:contains("Kilometers:")'),soup.select('ul.features li:contains("Doors:")'),soup.select('ul.features li:contains("Color:")')):&nbsp; &nbsp; Year.append(year.find_next('strong').text)&nbsp; &nbsp; KiloMeter.append(km.find_next('strong').text)&nbsp; &nbsp; Doors.append(dor.find_next('strong').text)&nbsp; &nbsp; Color.append(colr.find_next('strong').text)print(Year,KiloMeter,Doors,Color)输出:列表['2016'] ['81,000'] ['2 door'] ['White']
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python