将使用 BeautifulSoup 检索到的数据保存到数组中

*大家好,我是 BeautifulSoup 的新手,我不太了解如何提取数据。我想提取亚马逊畅销书排行榜的前十个标题并将其存储到一个数组中。


我的目标是创建亚马逊的前 10 名列表,并针对不同的类别一遍又一遍地复制该过程。我只想提取产品的“标题”。


这是我的代码:*


from bs4 import BeautifulSoup

import requests


headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}

url_amazon = "https://www.amazon.co.uk/Best-Sellers-Electronics/zgbs/electronics"

response = requests.get(url_amazon, headers = headers)

soup = BeautifulSoup(response.content, "lxml")


print(soup.prettify())


title = soup.find("h1", class_ = "a-size-large a-spacing-medium zg-margin-left-15 a-text-bold").text

print(title)



titles = []


for item in soup.findAll("div", attrs = {"class" : "a-fixed-left-grid-col a-col-right"}):

    name = item.find("div", attrs = {"class" : "p13n-sc-truncated"})

    if name is not None:

        titles.append(name.text)

    else:

        titles.append("unknown title")


print(len(titles))


for i in titles:

    print(i)


输出是:“未知标题”


喵喵时光机
浏览 121回答 1
1回答

扬帆大鱼

您的第一个问题是该行中的 CSS 类name = item.find("div", attrs={"class": "p13n-sc-truncated"}应该是p13n-sc-truncate. 您的第二个问题是您用来查找项目的类过于具体(对于第一项)。我发现用 class 搜索列表项更有用zg-item-immersion。如果只想列出前 10 个项目,则可以将[:10]切片说明符添加到主 for 循环中。把它们放在一起,我们得到:import requestsfrom bs4 import BeautifulSoupheaders = {    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"}url_amazon = (    "https://www.amazon.co.uk/Best-Sellers-Electronics/zgbs/electronics")response = requests.get(url_amazon, headers=headers)soup = BeautifulSoup(response.content, "lxml")print(soup.prettify())title = soup.find(    "h1", class_="a-size-large a-spacing-medium zg-margin-left-15 a-text-bold").textprint(title)titles = []for item in soup.findAll("li", attrs={"class": "zg-item-immersion"})[:10]:    name = item.find("div", attrs={"class": "p13n-sc-truncate"})    if name is not None:        titles.append(name.text.strip())    else:        titles.append("unknown title")print(len(titles))for i in titles:    print(i)我用来name.text.strip()删除换行符和多余的空格。需要注意的是,这个脚本比较脆弱,因为亚马逊可以随时更改布局和类名。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python