使用 Requests_HTML 抓取 JS 渲染页面未按预期工作

我正在研究 Scraping JS 渲染页面 ( https://www.flipkart.com/search?q=Acer+Laptops )。在此页面中,产品图像是动态加载的。这些图像的预渲染 SRC 值是

//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

渲染后,SRC应该是这样的

https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70

使用 requests_html 我可以获得 SRC 值,但它仅适用于顶部的前几张图像。请帮帮我好吗?我的代码:-

res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")

res.html.render()

all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results

items = all_results.find('._1UoZlX') # Container for each product being displayed

for item in items:

   item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')

   print(item_image)

输出:-


https://rukminim1.flixcart.com/image/312/312/kamtsi80/computer/m/8/y/acer-na-gaming-laptop-original-imafs5prytwgrcyf.jpeg?q=70

https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70

//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

如您所见,前两张图片已加载,其余图片未加载。谢谢大家!


红颜莎娜
浏览 129回答 0
0回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python