BeautifulSoup 找不到具有特定类的 div

http://img3.mukewang.com/61e653b40001005b14410391.jpg

因此,对于某些背景,我一直在尝试学习网络抓取,以便为涉及 CNN 的机器学习项目获取一些图像。我一直在尝试从网站上抓取一些图像(左侧是 HTML 代码,右侧是我的代码),但没有成功;我的代码最终打印/返回一个空列表。有什么我做错了吗?


对于它的价值,我尝试找到其他具有“id”而不是“类”的 div 标签并且确实有效,但由于某种原因它找不到我正在寻找的标签。


编辑:


import requests

import urllib3

from bs4 import BeautifulSoup


http = urllib3.PoolManager()

url = 'https://www.grailed.com/shop/EkpEBRw4rw'


response = http.request('GET', url)

soup = BeautifulSoup(response.data, 'html.parser')


img_div = soup.findAll('div', {'class': "listing-cover-photo "})

print(img_div)

编辑2:


from bs4 import BeautifulSoup

from selenium import webdriver


url = 'https://www.grailed.com/shop/EkpEBRw4rw'

driver = webdriver.Chrome(executable_path='chromedriver.exe')

driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser')


listing = soup.select('.listing-cover-photo ')

for item in listing:

    print(item.select('img'))

输出:


[<img alt="Off-White Off White Caravaggio Hoodie" src="https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/cache=expiry:max/rotate=deg:exif/resize=width:480,height:640,fit:crop/output=format:webp,quality:70/compress/https://cdn.fs.grailed.com/api/file/yX8vvvBsTaugadX0jssT"/>]

(...a few more of these...)

[<img alt="Off-White Off-White Arrows Hoodie Black" src="https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/cache=expiry:max/rotate=deg:exif/resize=width:480,height:640,fit:crop/output=format:webp,quality:70/compress/https://cdn.fs.grailed.com/api/file/9CMvJoQIRaqgtK0u9ov0"/>]

[]

[]

[]

[]

(...many more empty lists...)


慕雪6442864
浏览 253回答 1
1回答

慕姐8265434

看起来网站正在使用 JavaScript 加载数据。尝试使用 Selenium 和美丽的汤。from bs4 import BeautifulSoupfrom selenium import webdriverurl = "https://www.grailed.com/shop/EkpEBRw4rw"browser = webdriver.Chrome(executable_path="/path/to/chromedriver.exe")browser.get(url)soup = BeautifulSoup(browser.page_source,"html.parser")items=soup.select(".listing-cover-photo ")print(items)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python