猿问

如何使用 beautiful soup 从脚本标签中提取 json?

reviewCount我想使用 beautiful soup 从脚本标签中提取。尝试了不同的方法但没有成功。


<script type="application/json" data-initial-state="review-filter">

{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}

</script>


慕田峪7331174
浏览 166回答 3
3回答

jeck猫

这应该可行,我绝对确定有一种更优雅的方法:import jsonfrom bs4 import BeautifulSouphtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup = BeautifulSoup(html, 'html.parser')res = soup.find('script')json_object = json.loads(res.contents[0])for language in json_object['languages']:&nbsp; &nbsp; print('{}: {}'.format(language['displayName'], language['reviewCount']))输出:Toutes les langues: 573français: 567English: 6

慕无忌1623718

导入 json 并加载数据json,然后 iterarte 获取所有reviewCount.import jsonhtml='''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup=BeautifulSoup(html,"html.parser")item=soup.select_one('script[data-initial-state="review-filter"]').textjsondata=json.loads(item)for item in jsondata['languages']:&nbsp; &nbsp; print(item['reviewCount'])输出:5735676

慕妹3242003

import rehtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]print(match)输出:['573', '567', '6']
随时随地看视频慕课网APP

相关分类

Html5
我要回答