Scrapy/忽略空项目

首页课程实战体系课手记专栏慕课教程

Scrapy/忽略空项目

我制作了这个小机器人，它通过搜索参数列表进行处理。它工作正常，直到页面上出现多个结果：product_prices_euros给出一半为空的项目列表。因此，当我与连接时product_prices_cents，我有如下输出：

'price' : '',76

对于一半的结果。有没有一种简单的方法可以防止收集空物品？我的输出product_prices_euros看起来像：

[' 1', ' ', ' 2', ' ', ' 2', ' ', ' 1', ' ', ' 1', ' ', ' 1', ' ', ' 2', ' ']

我只想保留“1”、“2”等...

这是看起来像 CSS 的内容。这方面可能有一些东西：

< span class="product-pricing__main-price" >

< span class="cents" >,79€< /span >

< /span >

还有我的代码：

def start_requests(self):

base_url="https://new.carrefour.fr/s?q="

test_file = open(r"example", "r")

reader = csv.reader(test_file)

for row in reader:

if row:

url = row[0]

absolute_url = base_url+url

print(absolute_url)

yield scrapy.Request(absolute_url, meta={'dont_redirect': True, "handle_httpstatus_list": [302, 301]}, callback=self.parse)

def parse(self, response):

product_name = response.css("h2.label.title::text").extract()

product_packaging = response.css("div.label.packaging::text").extract()

product_price_euros = response.css("span.product-pricing__main-price::text").extract()

product_price_cents = response.css("span.cents::text").extract()

for name, packaging, price_euro, price_cent in zip(product_name, product_packaging, product_price_euros, product_price_cents):

yield { 'ean' : response.css("h1.page-title::text").extract(), 'name': name+packaging, 'price': price_euro+price_cent}

任何的想法？:)

潇潇雨雨

浏览 107回答 2

2回答

函数式编程

如果您只是过滤空的欧元元素，您如何将它们与适当的美分匹配？首先，恕我直言，我认为如果您遍历产品以收集其数据会更容易。例如。for product in response.css('.product-list__item'):    name = product.css("h2.label.title::text").extract()    # ...因此，您可以获得这样的价格和美分：>>> product.css('.product-pricing__main-price  ::text')['2', ',99€']>>> ''.join(product.css('.product-pricing__main-price  ::text').getall())'2,99€'

0 0

波斯汪

最后，您可以通过您不想要的事件过滤您的列表：list(filter(lambda a: a != '', yourList))

0 0

随时随地看视频慕课网APP