使用 python3-beautifulsoup3 从 HTML 中抓取字符串

我正在尝试使用 beautifulsoup 从表行中获取字符串。我想要获取的字符串是第二行和第三行的“SANDAL”和“SHORTS”。我知道这可以用正则表达式或字符串函数来解决,但我想学习 beautifulsoup 并尽可能多地使用 beautifulsoup。


截取的 python 代码


    soup=beautifulsoup(page,'html.parser')

    table=soup.find('table')

    row=table.find_next('tr')

    row=row.find_next('tr')

HTML


    <html>

    <body>

    <div id="body">

    <div class="data">

    

    <table id="products">

    

    <tr><td>PRODUCT<td class="ole1">ID<td class="c1">TYPE<td class="ole1">WHEN<td class="ole4">ID<td class="ole4">ID</td></tr>

    <tr><td>SANDAL<td class="ole1">77313<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878717</td></tr>

    <tr><td>SHORTS<td class="ole1">77314<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878718</td></tr>

    

    </table>

    

    </div>

    </div>

    </body>

    </html>


阿晨1998
浏览 124回答 1
1回答

冉冉说

要从表格的第一列(无标题)获取文本,您可以使用此脚本:from bs4 import BeautifulSouptxt = '''&nbsp; &nbsp; <html>&nbsp; &nbsp; <body>&nbsp; &nbsp; <div id="body">&nbsp; &nbsp; <div class="data">&nbsp; &nbsp; <table id="products">&nbsp; &nbsp; <tr><td>PRODUCT<td class="ole1">ID<td class="c1">TYPE<td class="ole1">WHEN<td class="ole4">ID<td class="ole4">ID</td></tr>&nbsp; &nbsp; <tr><td>SANDAL<td class="ole1">77313<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878717</td></tr>&nbsp; &nbsp; <tr><td>SHORTS<td class="ole1">77314<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878718</td></tr>&nbsp; &nbsp; </table>&nbsp; &nbsp; </div>&nbsp; &nbsp; </div>&nbsp; &nbsp; </body>&nbsp; &nbsp; </html>'''soup = BeautifulSoup(txt, 'lxml')&nbsp; # <-- lxml is important here (to parse the HTML code correctly)for tr in soup.find('table', id='products').find_all('tr')[1:]:&nbsp; # <-- [1:] because we want to skip the header&nbsp; &nbsp; print(tr.td.text)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # <-- print contents of first <td> tag印刷:SANDALSHORTS
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python