您好,我尝试抓取以下网站: https: //www.footlocker.co.uk/en/all/new/
我想抓取以下元素的价格和“href”:
<span class=" fl-price--sale ">
<meta itemprop="priceCurrency" content="GBP">
<meta itemprop="price" content="84.99"><span>£ 84,99</span>
</span>
和这个(参考):
<a href="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all" data-product-click-link="314102617504" data-hash-key="searchCategory" data-hash-url="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504" data-testid="fl-product-details-link-314102617504">
我试过这段代码:
import urllib.request
import bs4 as bs
from bs4 import BeautifulSoup
import requests
proxies = {'type':'ip:port'}
r= requests.get('https://www.footlocker.de/de/alle/new/', proxies=proxies)
soup = BeautifulSoup(r.content,'html.parser')
# It doesn't find it...
for a in (soup.find_all('a')):
try:
if a['href'] == 'https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all':
print(a['href'])
except:
pass
# It don't find it...
for price in (soup.find_all('span', class_=' fl-price--sale ')):
print(price.text)
我尝试使用代理抓取,但他拒绝抓取元素(我认为 HTML 不正确)
感谢您的建议:-)(仅用于教育建议)
不负相思意
相关分类