我正在尝试从电子商务网站上抓取一些数据用于个人项目。我正在尝试从 html 构建一个嵌套的字符串列表,但 html 的一部分出现问题。每个列表项如下所示:
<div class="impressions" data-impressions=\'{"id":"01920","name":"Sleepy","price":12.95,"brand":"Lush","category":"Bubble Bar","variant":"7 oz.","quantity":1,"list":"/bath/bubble-bars/sleepy/9999901920.html","dimension11":"","dimension12":"Naked,Self Preserving,Vegan","dimension13":1,"dimension14":1,"dimension15":true}\'></div>
我现在拥有的是一个正则表达式,它可以将 data-impressions 标签中的所有项目像这样转换并在逗号处分割它们:
list_return = [re.findall('\{([^{]+[^}\'></div>])', i) for i in bathshower_impressions]
list_return = [re.split(',', list_return[i][0]) for i in range(0, len(list_return))]
这为我提供了每个事物的列表列表,这些列表将成为字典中的键:值对。对于上面的示例,第二级项目如下:
[['"id"', '"01920"'],
['"name"', '"Sleepy"'],
['"price"', '12.95'],
['"brand"', '"Lush"'],
['"category"', '"Bubble Bar"'],
['"variant"', '"7 oz."'],
['"quantity"', '1'],
['"list"', '"/bath/bubble-bars/sleepy/9999901920.html"'],
['"dimension11"', '""'],
['"dimension12"', '"Naked'],
['Self Preserving'],
['Vegan"'],
['"dimension13"', '1'],
['"dimension14"', '1'],
['"dimension15"', 'true']]
我的问题是维度 12,我不知道如何排除该维度以逗号分隔,以便该列表显示为:
['"dimension12"', '"Naked,Self Preserving,Vegan"']
如有任何帮助,我们将不胜感激,谢谢。
繁花如伊
相关分类