我想提取网页,如: https://www.glassdoor.com/Overview/Working-at-Apple-EI_IE1138.11,16.htm,所以我想以以下格式返回结果。
Website Headquarters Size Revenue Type
www.apple.com Cupertino, CA 10000+ employees $10+ billion (USD) per year Company - Public (AAPL)
然后我使用下面的代码beatifulsoup来得到这个。
all_href = com_soup.find_all('span', {'class': re.compile('value')})
all_href = list(set(all_href))
它返回带有<span>. 此外,它没有在下面显示标签<label>
[<span class="value"> Computer Hardware & Software</span>,
<span class="value"> Company - Public (AAPL) </span>,
<span class="value">10000+ employees</span>,
<span class="value"> $10+ billion (USD) per year</span>,
<span class="value-title" title="4.0"></span>,
<span class="value">Cupertino, CA</span>,
<span class="value"> 1976</span>,
<span class="value-title" title="5.0"></span>,
<span class="value website"><a class="link" href="http://www.apple.com" rel="nofollow noreferrer" target="_blank">www.apple.com</a></span>]
相关分类