美丽的汤选择器返回一个空列表

所以我正在做自动化无聊的东西课程,我试图抓取自动化无聊的东西书的亚马逊价格,但无论如何它都会返回一个空字符串,因此在 < /span>elems[0].text.strip()我不知道该怎么办。


def getAmazonPrice(productUrl):

    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'} # to make the server think its a web browser and not a bot

    res = requests.get(productUrl, headers=headers)

    res.raise_for_status()



    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')

    return elems[0].text.strip()



price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')

print('The price is ' + price)


MMMHUHU
浏览 62回答 2
2回答

繁华开满天机

您需要将解析器更改为 lxml 并使用 headers = {'user-agent': 'Mozilla/5.0'}def getAmazonPrice(productUrl):&nbsp; &nbsp; headers = {'user-agent': 'Mozilla/5.0'} # to make the server think its a web browser and not a bot&nbsp; &nbsp; res = requests.get(productUrl, headers=headers)&nbsp; &nbsp; res.raise_for_status()&nbsp; &nbsp; soup = bs4.BeautifulSoup(res.text, 'lxml')&nbsp; &nbsp; elems = soup.select_one('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')&nbsp; &nbsp; return elems.text.strip()price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')print('The price is ' + price)快照:如果你想使用选择然后def getAmazonPrice(productUrl):&nbsp; &nbsp; headers = {'user-agent': 'Mozilla/5.0'} # to make the server think its a web browser and not a bot&nbsp; &nbsp; res = requests.get(productUrl, headers=headers)&nbsp; &nbsp; res.raise_for_status()&nbsp; &nbsp; soup = bs4.BeautifulSoup(res.text, 'lxml')&nbsp; &nbsp; elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')&nbsp; &nbsp; return elems[0].text.strip()price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')print('The price is ' + price)尝试用这个。def getAmazonPrice(productUrl):&nbsp; &nbsp; headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'}&nbsp; # to make the server think its a web browser and not a bot&nbsp; &nbsp; res = requests.get(productUrl, headers=headers)&nbsp; &nbsp; res.raise_for_status()&nbsp; &nbsp; soup = bs4.BeautifulSoup(res.text, 'lxml')&nbsp; &nbsp; elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last')&nbsp; &nbsp; return elems[0].text.strip()price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')print('The price is ' + price)

函数式编程

您的请求将触发亚马逊的 503 错误。也许是由于亚马逊的反抓取努力。所以也许你应该考虑一些其他的方法。import requestsheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'} # to make the server think its a web browser and not a botproductUrl = 'https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1'res = requests.get(productUrl, headers=headers)print (res)输出:<Response [503]>
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5