蛊毒传说
有两种方法。容易,而且是错误的。或者不那么容易,但正确。我不会向你推荐简单的方法。正确的方法是使用 Javascript 解析器。对于现代 Javascript,esprima是一个不错的选择。有一个交互式在线演示,它也可以作为Python 模块使用。import esprima# script body as extracted from beautifulsoupscript_text = """ var myvar = { productid: "101", productname: "Abc", };""";tokens = esprima.tokenize(script_text)在这个简单的脚本中,没有太多内容。原始令牌列表足以获得您想要的值。它看起来像这样:[ { "type": "Keyword", "value": "var" }, { "type": "Identifier", "value": "myvar" }, { "type": "Punctuator", "value": "=" }, { "type": "Punctuator", "value": "{" }, { "type": "Identifier", "value": "productid" }, { "type": "Punctuator", "value": ":" }, { "type": "String", "value": "\"101\"" }, { "type": "Punctuator", "value": "," }, { "type": "Identifier", "value": "productname" }, { "type": "Punctuator", "value": ":" }, { "type": "String", "value": "\"Abc\"" }, { "type": "Punctuator", "value": "," }, { "type": "Punctuator", "value": "}" }, { "type": "Punctuator", "value": ";" }]迭代列表并选择您需要的值。token_iterator = iter(tokens)for token in token_iterator: if token["type"] == "Identifier" and token["value"] == "productname": # the token after the next must be the one that holds the associated value value_token = next(next(token_iterator)) productname = value_token["value"]对于更复杂的情况,可能需要将脚本解析为树并遍历树。tree = esprima.parse(script_text)该树更复杂(您可以在交互式页面上查看它),但作为交换,它携带了普通标记列表中缺少的所有上下文信息。然后,您将使用访问者模式将这棵树步行到特定位置。如果您有兴趣,Python 包中有一个如何使用访问者模式的示例。
慕田峪7331174
解析from bs4 import BeautifulSoupscript_data='''<script type="text/javascript"> var myvar = { productid: "101", productname: "Abc", };</script>'''soup = BeautifulSoup(script_data)soup.script.string将script标签内的数据保存为字符串。您可以使用split字符串来获取位置数据:soup.script.string.split()Output:['var', 'myvar', '=', '{', 'productid:', '"101",', 'productname:', '"Abc",', '};']产品编号:soup.script.string.split()[5].split('"')[1]Output:'101'产品名称:soup.script.string.split()[7].split('"')[1]Output:'Abc'