猿问

使用python在html上提取<label><span>标签

我想提取网页,如: https://www.glassdoor.com/Overview/Working-at-Apple-EI_IE1138.11,16.htm,所以我想以以下格式返回结果。


Website       Headquarters  Size             Revenue                Type

www.apple.com Cupertino, CA 10000+ employees $10+ billion (USD) per year     Company - Public (AAPL)

然后我使用下面的代码beatifulsoup来得到这个。


all_href = com_soup.find_all('span', {'class': re.compile('value')})

all_href = list(set(all_href))

它返回带有<span>. 此外,它没有在下面显示标签<label>


[<span class="value"> Computer Hardware &amp; Software</span>,

 <span class="value"> Company - Public (AAPL) </span>,

 <span class="value">10000+ employees</span>,

 <span class="value"> $10+ billion (USD) per year</span>,

 <span class="value-title" title="4.0"></span>,

 <span class="value">Cupertino, CA</span>,

 <span class="value"> 1976</span>,

 <span class="value-title" title="5.0"></span>,

 <span class="value website"><a class="link" href="http://www.apple.com" rel="nofollow noreferrer" target="_blank">www.apple.com</a></span>]


呼唤远方
浏览 347回答 2
2回答
随时随地看视频慕课网APP

相关分类

Python
我要回答