使用美丽的汤在 HTML 表格中查找信息

我正在尝试从 html 表中提取信息(在此示例页面https://www.detrasdelafachada.com/house-for-sale-marianao-havana-cuba/dcyktckvwjxhpl9 中找到):


<div class="row">

    <div class="col-label">

        Type of property:

    </div>

    <div class="col-datos">

        Apartment </div>

</div>

<div class="row">

    <div class="col-label">

        Building style:

    </div>

    <div class="col-datos">

        50 year </div>

</div>

<div class="row">

    <div class="col-label precio">

        Sale price:

    </div>

    <div class="col-datos precio">

        12 000 CUC </div>

</div>

<div class="row">

    <div class="col-label">

        Rooms:

    </div>

    <div class="col-datos">

        1 </div>

</div>

<div class="row">

    <div class="col-label">

        Bathrooms:

    </div>

    <div class="col-datos">

        1 </div>

</div>

<div class="row">

    <div class="col-label">

        Kitchens:

    </div>

    <div class="col-datos">

        1 </div>

</div>

<div class="row">

    <div class="col-label">

        Surface:

    </div>

    <div class="col-datos">

        38 mts2 </div>

</div>

<div class="row">

    <div class="col-label">

        Year of construction:

    </div>

    <div class="col-datos">

        1945 </div>

</div>

<div class="row">

    <div class="col-label">

        Building style:

    </div>

    <div class="col-datos">

        50 year </div>

</div>

<div class="row">

    <div class="col-label">

        Construction type:

    </div>

    <div class="col-datos">

        Masonry and plate </div>

</div>

<div class="row">

    <div class="col-label">

        Home conditions:

    </div>

    <div class="col-datos">

        Good </div>

</div>

<div class="row">

    <div class="col-label">

        Other peculiarities:

    </div>

</div>

<div class="row">

使用美丽的汤,我如何找到“建筑风格:”(以及其他条目)的价值?


我的问题是我直接找到了类,因为表中的所有条目都具有相同的 div 类名。


温温酱
浏览 218回答 2
2回答

HUWWW

您可以遍历每一行div并找到嵌套div值:from bs4 import BeautifulSoup as soupimport red = soup(content, 'html.parser')results = [[re.sub('\s{2,}|\n+', '', i.text) for i in b.find_all('div')] for b in d.find_all('div', {'class':'row'})]输出:[['Type of property:', 'Apartment '], ['Building style:', '50 year '], ['Sale price:', '12 000 CUC '], ['Rooms:', '1 '], ['Bathrooms:', '1 '], ['Kitchens:', '1 '], ['Surface:', '38 mts2 '], ['Year of construction:', '1945 '], ['Building style:', '50 year '], ['Construction type:', 'Masonry and plate '], ['Home conditions:', 'Good '], ['Other peculiarities:'], []]

慕的地6264312

例如,如果您知道您特别想查找字符串“Building style:”,那么您可以捕获.next_sibling. 或者只是使用next:>>> from bs4 import BeautifulSoup>>> html = "<c><div>hello</div> <div>hi</div></c>">>> soup = BeautifulSoup(html, 'html.parser')>>> print(soup.find(string="hello").find_next('div').contents[0])hi如果你想要所有这些,你可以使用.find_all获取类“ row”的所有 div 标签,然后获取每个的孩子。data = []soup = BeautifulSoup(html, 'html.parser')for row in soup.find_all('div', class_="row"):&nbsp; &nbsp; rowdata = [ c.text.strip() for c in row.find_all('div')]&nbsp; &nbsp; data.append(rowdata)print(data)# Outputs the nested list:#&nbsp; &nbsp;[u'Type of property:', u'Apartment'], [u'Building style:', u'50 year'], etc ]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python