Python - 如何从脚本中的变量内部提取数据?

我是 Python 新手,我正在尝试使用 BeautifulSoup 从脚本中定义的变量中提取一些数据。


data = soup.find_all('script', type='text/javascript')

print(data[0])


<script type="text/javascript">

  var myvar = {

    productid: "101",

    productname: "Abc",

  };

</script>

您知道从 myvar 变量中提取“productid”和“productname”的简单方法吗?


qq_遁去的一_1
浏览 342回答 3
3回答

蛊毒传说

有两种方法。容易,而且是错误的。或者不那么容易,但正确。我不会向你推荐简单的方法。正确的方法是使用 Javascript 解析器。对于现代 Javascript,esprima是一个不错的选择。有一个交互式在线演示,它也可以作为Python 模块使用。import esprima# script body as extracted from beautifulsoupscript_text = """&nbsp; var myvar = {&nbsp; &nbsp; productid: "101",&nbsp; &nbsp; productname: "Abc",&nbsp; };""";tokens = esprima.tokenize(script_text)在这个简单的脚本中,没有太多内容。原始令牌列表足以获得您想要的值。它看起来像这样:[&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Keyword",&nbsp; &nbsp; &nbsp; &nbsp; "value": "var"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Identifier",&nbsp; &nbsp; &nbsp; &nbsp; "value": "myvar"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": "="&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": "{"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Identifier",&nbsp; &nbsp; &nbsp; &nbsp; "value": "productid"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": ":"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "String",&nbsp; &nbsp; &nbsp; &nbsp; "value": "\"101\""&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": ","&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Identifier",&nbsp; &nbsp; &nbsp; &nbsp; "value": "productname"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": ":"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "String",&nbsp; &nbsp; &nbsp; &nbsp; "value": "\"Abc\""&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": ","&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": "}"&nbsp; &nbsp; },&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; "type": "Punctuator",&nbsp; &nbsp; &nbsp; &nbsp; "value": ";"&nbsp; &nbsp; }]迭代列表并选择您需要的值。token_iterator = iter(tokens)for token in token_iterator:&nbsp; &nbsp; if token["type"] == "Identifier" and token["value"] == "productname":&nbsp; &nbsp; &nbsp; &nbsp; # the token after the next must be the one that holds the associated value&nbsp; &nbsp; &nbsp; &nbsp; value_token = next(next(token_iterator))&nbsp; &nbsp; &nbsp; &nbsp; productname = value_token["value"]对于更复杂的情况,可能需要将脚本解析为树并遍历树。tree = esprima.parse(script_text)该树更复杂(您可以在交互式页面上查看它),但作为交换,它携带了普通标记列表中缺少的所有上下文信息。然后,您将使用访问者模式将这棵树步行到特定位置。如果您有兴趣,Python 包中有一个如何使用访问者模式的示例。

慕田峪7331174

解析from bs4 import BeautifulSoupscript_data='''<script type="text/javascript">&nbsp; var myvar = {&nbsp; &nbsp; productid: "101",&nbsp; &nbsp; productname: "Abc",&nbsp; };</script>'''soup = BeautifulSoup(script_data)soup.script.string将script标签内的数据保存为字符串。您可以使用split字符串来获取位置数据:soup.script.string.split()Output:['var',&nbsp;'myvar',&nbsp;'=',&nbsp;'{',&nbsp;'productid:',&nbsp;'"101",',&nbsp;'productname:',&nbsp;'"Abc",',&nbsp;'};']产品编号:soup.script.string.split()[5].split('"')[1]Output:'101'产品名称:soup.script.string.split()[7].split('"')[1]Output:'Abc'

慕斯王

对于简单的方法,我将使用 Regeximport re.....data = soup.find_all('script', type='text/javascript')productid = re.search(r'productid:\s*"(.*?)"', data[0].text).group(1)print(productid)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python