Beautifulsoup 无法抓取元素

您好,我尝试抓取以下网站: https: //www.footlocker.co.uk/en/all/new/

我想抓取以下元素的价格和“href”:

<span class=" fl-price--sale ">

    <meta itemprop="priceCurrency" content="GBP">

    <meta itemprop="price" content="84.99"><span>£ 84,99</span>

</span>

和这个(参考):


<a href="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all" data-product-click-link="314102617504" data-hash-key="searchCategory" data-hash-url="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504" data-testid="fl-product-details-link-314102617504">

我试过这段代码:


import urllib.request

import bs4 as bs

from bs4 import BeautifulSoup

import requests


proxies = {'type':'ip:port'}


r= requests.get('https://www.footlocker.de/de/alle/new/', proxies=proxies)


soup = BeautifulSoup(r.content,'html.parser')


# It doesn't find it...

for a in (soup.find_all('a')):

    try:

        if a['href'] == 'https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all':

            print(a['href'])

    except:

        pass

# It don't find it...

for price in (soup.find_all('span', class_=' fl-price--sale ')):

    print(price.text)

我尝试使用代理抓取,但他拒绝抓取元素(我认为 HTML 不正确)


感谢您的建议:-)(仅用于教育建议)


烙印99
浏览 92回答 1
1回答

不负相思意

要获取产品的名称、链接和价格,您可以使用以下示例:import requestsfrom bs4 import BeautifulSoupurl = 'https://www.footlocker.co.uk/INTERSHOP/web/FLE/Footlocker-Footlocker_GB-Site/en_GB/-/GBP/ViewStandardCatalog-ProductPagingAjax?SearchParameter=____&sale=new&MultiCategoryPathAssignment=all&PageNumber={}'for page in range(3):&nbsp; # <--- increase the number of pages here&nbsp; &nbsp; print('Page {}...'.format(page))&nbsp; &nbsp; data = requests.get(url.format(page)).json()&nbsp; &nbsp; soup = BeautifulSoup(data['content'], 'html.parser')&nbsp; &nbsp; for d in soup.select('[data-request]'):&nbsp; &nbsp; &nbsp; &nbsp; s = BeautifulSoup(requests.get(d['data-request']).json()['content'], 'html.parser')&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; print(s.select_one('[itemprop="name"]').text)&nbsp; &nbsp; &nbsp; &nbsp; print(s.select_one('[itemprop="price"]')['content'], s.select_one('[itemprop="priceCurrency"]')['content'])&nbsp; &nbsp; &nbsp; &nbsp; print(s.a['href'])&nbsp; &nbsp; &nbsp; &nbsp; print('-' * 80)印刷:Page 0...adidas Performance Don Issue 2 - Men Shoes84.99 GBPhttps://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504--------------------------------------------------------------------------------Nike Air Force 1 Crater - Women Shoes94.99 GBPhttps://www.footlocker.co.uk/en/p/nike-air-force-1-crater-women-shoes-98071?v=315349054502--------------------------------------------------------------------------------Jordan Jumpmcn Cl Iii Camo - Baby Tracksuits39.99 GBPhttps://www.footlocker.co.uk/en/p/jordan-jumpmcn-cl-iii-camo-baby-tracksuits-91611?v=318280390044--------------------------------------------------------------------------------Jordan 13 Retro - Grade School Shoes99.99 GBPhttps://www.footlocker.co.uk/en/p/jordan-13-retro-grade-school-shoes-952?v=316701533404--------------------------------------------------------------------------------...and so on.
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python