无法使用 BeautifulSoup 提取脚本标签的内容

soup.find('script',type='application/ld+json').text 返回空数据,为什么我无法提取文本。


>>> soup = BeautifulSoup(page.text,'lxml')


>>> soup.find('script',type='application/ld+json').text**

''

>>> soup.find('script',type='application/ld+json')

<script type="application/ld+json">{"@context":"http://schema.org","@type":"Organization","name":"Hamilton Medical Group - Dunkeld","url":"https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling","contactPoint":{"@type":"ContactPoint","telephone":"03 5572 2422","email":"","website":"http://www.hamiltonmedicalgroup.net.au","fax":"03 5571 1606"},"address":{"@type":"PostalAddress","streetAddress":"14 Sterling Street","addressLocality":"DUNKELD","addressRegion":"VIC","postalCode":"3294","addressCountry":"AU"}}</script>

>>> json.loads(soup.find('script',type='application/ld+json'))

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

NameError: name 'json' is not defined

>>> import json

>>> json.loads(soup.find('script',type='application/ld+json'))

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "C:\Users\*******\Python38\lib\json\__init__.py", line 341, in loads

    raise TypeError(f'the JSON object must be str, bytes or bytearray, '

TypeError: the JSON object must be str, bytes or bytearray, not Tag


子衿沉夜
浏览 115回答 1
1回答

喵喔喔

使用.string属性获取<script>数据:import jsonfrom bs4 import BeautifulSouphtml_text = '''<script type="application/ld+json">{"@context":"http://schema.org","@type":"Organization","name":"Hamilton Medical Group - Dunkeld","url":"https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling","contactPoint":{"@type":"ContactPoint","telephone":"03 5572 2422","email":"","website":"http://www.hamiltonmedicalgroup.net.au","fax":"03 5571 1606"},"address":{"@type":"PostalAddress","streetAddress":"14 Sterling Street","addressLocality":"DUNKELD","addressRegion":"VIC","postalCode":"3294","addressCountry":"AU"}}</script>'''soup = BeautifulSoup(html_text, 'html.parser')parsed_data = json.loads(soup.find('script',type='application/ld+json').string)# print parsed data to screen:print(json.dumps(parsed_data, indent=4))印刷:{&nbsp; &nbsp; "@context": "http://schema.org",&nbsp; &nbsp; "@type": "Organization",&nbsp; &nbsp; "name": "Hamilton Medical Group - Dunkeld",&nbsp; &nbsp; "url": "https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling",&nbsp; &nbsp; "contactPoint": {&nbsp; &nbsp; &nbsp; &nbsp; "@type": "ContactPoint",&nbsp; &nbsp; &nbsp; &nbsp; "telephone": "03 5572 2422",&nbsp; &nbsp; &nbsp; &nbsp; "email": "",&nbsp; &nbsp; &nbsp; &nbsp; "website": "http://www.hamiltonmedicalgroup.net.au",&nbsp; &nbsp; &nbsp; &nbsp; "fax": "03 5571 1606"&nbsp; &nbsp; },&nbsp; &nbsp; "address": {&nbsp; &nbsp; &nbsp; &nbsp; "@type": "PostalAddress",&nbsp; &nbsp; &nbsp; &nbsp; "streetAddress": "14 Sterling Street",&nbsp; &nbsp; &nbsp; &nbsp; "addressLocality": "DUNKELD",&nbsp; &nbsp; &nbsp; &nbsp; "addressRegion": "VIC",&nbsp; &nbsp; &nbsp; &nbsp; "postalCode": "3294",&nbsp; &nbsp; &nbsp; &nbsp; "addressCountry": "AU"&nbsp; &nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python