使用 Python 抓取数据并接收与 DevTools 不同的 html 树

我正在尝试从网站zara.com 抓取数据，我已经弄清楚如何使用列表中的一组项目解析父元素，但我想更深入地挖掘并打开每个项目链接并获取有关它的其他信息.

所以，我使用了这种代码：

import requests

import time

from bs4 import BeautifulSoup

ListWithRequests = ['https://www.zara.com/nl/en/plain-shirt-p06608389.html'] # In this example only one item

for item in ListWithRequests:

response = requests.get(item,verify=False)

soup2 = BeautifulSoup(response.text, "html.parser")

soup2.prettify()

time.sleep(1)

f = open("demo.html","w+")

f.write(response.text)

例如我想收到商品的价格，在开发工具中它是块

或项目 ID

<span class="_colorName">**White**

</span>

</p>

</div>

但是在demo.html文件中，我收到了完全不同的树，并且找不到我需要的任何元素。

请告诉我我做错了什么

杨魅力

浏览 56回答 1

1回答

GCT1015

页面是通过加载的JavaScript，因此bs4将无法呈现它。您可以selenium在这种情况下使用，但我注意到您要查找的数据实际上显示在script标签中，您可以轻松加载它JSON或快速捕获，我使用过re：import requestsimport redef main(url):    r = requests.get(url)    price = re.search(r'\"price\": \"(.*?)\"', r.text).group(1)    print(price)main("https://www.zara.com/nl/en/plain-shirt-p06608389.html")输出：25.95

随时随地看视频慕课网APP