猿问

Python - 使用 bs4 搜索特定的“var”

因此,我一直在尝试使用 scrape 来学习一些知识,在那里我设法抓取了一个返回大量不同 var 值的站点,例如:


var FancyboxI18nClose = 'Close';

var FancyboxI18nNext = 'Next';

var FancyboxI18nPrev = 'Previous';

var PS_CATALOG_MODE = false;

var added_to_wishlist = '.';

var ajax_allowed = true;

var ajaxsearch = true;

var attribute_anchor_separator = '-';

var attributesCombinations = [{"id_attribute":"100","id_attribute_group":"1","attribute":"38_5"},{"id_attribute":"101","id_attribute_group":"1","attribute":"39"},{"id_attribute":"103","id_attribute_group":"1","attribute":"40"},{"id_attribute":"104","id_attribute_group":"1","attribute":"40_5"},{"id_attribute":"105","id_attribute_group":"1","attribute":"41"},{"id_attribute":"107","id_attribute_group":"1","attribute":"42"},{"id_attribute":"108","id_attribute_group":"1","attribute":"42_5"},{"id_attribute":"109","id_attribute_group":"1","attribute":"43"},{"id_attribute":"111","id_attribute_group":"1","attribute":"44"},{"id_attribute":"112","id_attribute_group":"1","attribute":"44_5"},{"id_attribute":"132","id_attribute_group":"1","attribute":"45"},{"id_attribute":"113","id_attribute_group":"1","attribute":"46"}];

当然还有更多,它们都只包含在 var 中。但是,我想要做的是只能抓取其中一个值 - var attributesCombinations意味着我基本上只想打印出该值,然后我可以使用 json.loads 在那里我可以更轻松地抓取 json。


我试图做的是以下内容:


try:

    product_li_tags = bs4.find_all(text=re.compile('attributesCombinations'))

except Exception:

    product_li_tags = []

但这给了所有“var”开始到 where 的结果attributesCombinations。


['var CUSTOMIZE_TEXTFIELD = 1;\nvar FancyboxI18nClose = \'Close\';\nvar FancyboxI18nNext = \'Next\';\nvar FancyboxI18nPrev = \'Previous\';\nvar PS_CATALOG_MODE = false;\nvar added_to_wishlist = \'The product was successfully added to your wishlist.\';\nvar ajax_allowed = true;\nvar ajaxsearch = true;\nvar allowBuyWhenOutOfStock = false;\nvar attribute_anchor_separator = \'-\';\nvar attributesCombinations = [{"id_attribute":"100","id_attribute_group":"1","att...........

我如何使它只打印出var attributesCombinations?


尚方宝剑之说
浏览 228回答 2
2回答

紫衣仙女

提取(仅)从attributesCombinations到语句末尾的部分的正则表达式是var attributesCombinations = (\[.*?\])在 Python 中,您可以轻松地创建正则表达式re.compile(r'var attributesCombinations = (\[.*?\])');

慕田峪9158850

不要re.compile在bs4中使用,直接运行。match = re.compile('var\s*attributesCombinations\s*=\s*(\[.*?\])').findall(htmlString)attributesCombinations = json.loads(match[0])print(attributesCombinations)
随时随地看视频慕课网APP

相关分类

Python
我要回答