我使用下面的代码来爬取页面上的多个链接,并从每个相应的链接中获取数据列表:
carspider.py:
def parse_item(self, response):
sel = Selector(response)
item = CarscrapeItem()
item['carType'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@itemprop="manufacturer"]//text()').get()
item['model'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@itemprop="model"]//text()').get()
item['variant'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@class="float--right"]//text()')[3].get()
item['year'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@class="float--right"]//text()')[4].get()
item['engineCapacity'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@class="float--right"]//text()')[5].get()
item['transmission'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@class="float--right"]//text()')[6].get()
item['seatCapacity'] = sel.xpath('//div[@class="listing__section listing__section--key-details listing__key-details portable-one-whole push--bottom"]//span[@class="float--right"]//text()')[7].get()
我想删除重复的汽车类型并将其余行值附加到现有汽车类型。我想这样做一个推荐系统会更好。有可能用 Scrapy 做到这一点吗?我搜索了与重复值相关的回复。大多数情况下,它们与重复过滤器有关,而其他过滤器对我不起作用。
30秒到达战场
相关分类