我正在尝试从网站https://www.cellartracker.com/m/wines/12344 中抓取一些数据。我无法理解如何获取不属于标签中任何类的每个值。以下是我正在寻找的网站代码:
<ul class="twin-set-list">
<li><span>Vintage</span> 2000</li>
<li><span>Type</span> Red</li>
<li><span>Producer</span> Balnaves of Coonawarra</li>
<li><span>Varietal</span> Cabernet Sauvignon</li>
<li><span>Designation</span> The Tally Reserve</li>
<li><span>Vineyard</span> n/a</li>
<li><span>Country</span> Australia</li>
<li><span>Region</span> South Australia</li>
<li><span>SubRegion</span> Limestone Coast</li>
<li><span>Appellation</span> Coonawarra</li>
</ul>
像 2000、Red 等值没有任何类,所以我可以使用什么方式来获取数据。我在 python 中尝试了以下代码(下面仅给出了 html 部分):
from bs4 import BeautifulSoup
html = """<ul class="twin-set-list">
<li><span>Vintage</span> 2000</li>
<li><span>Type</span> Red</li>
<li><span>Producer</span> Balnaves of Coonawarra</li>
<li><span>Varietal</span> Cabernet Sauvignon</li>
<li><span>Designation</span> The Tally Reserve</li>
<li><span>Vineyard</span> n/a</li>
<li><span>Country</span> Australia</li>
<li><span>Region</span> South Australia</li>
<li><span>SubRegion</span> Limestone Coast</li>
<li><span>Appellation</span> Coonawarra</li>
</ul>"""
soup = BeautifulSoup(html, 'html.parser')
need = {}
for li_tag in soup.find_all('ul', {'class':'twin-set-list'}):
for span_tag in li_tag.find_all('li'):
field = span_tag.find('span').text
value = span_tag.find('span').text
need[field] = value
print(need)
谁能建议我如何提取这些数据?
狐的传说
慕桂英4014372
一只甜甜圈
相关分类