我在收到的日志中看不到任何错误,但只抓取了 108 个元素,尽管还有更多的项目需要抓取。所以,我猜这可能是分页的问题。但不知道如何解决。
这是我缩短的蜘蛛:
class AllbooksSpider(scrapy.Spider):
name = 'allbooks'
allowed_domains = ['www.digikala.com']
def start_requests(self):
yield scrapy.Request(url= 'https://www.digikala.com/search/category-book',
callback= self.parse)
def parse(self, response):
original_price=0
try:
for product in response.xpath("//ul[@class='c-listing__items js-plp-products-list']/li"):
title= product.xpath(".//div/div[2]/div/div/a/text()").get()
if product.xpath(".//div/div[2]/div[3]/div/div/del/text()"):
original_price= int(str(product.xpath(".//div/div[2]/div[3]/div/div/del/text()").get().strip()).replace(',', ''))
discounted_amount= original_price-discounted_price
else:
original_price= print("not available")
discounted_amount= print("not available")
yield{
'title':title,
'discounted_amount': discounted_amount
}
next_page= response.xpath('//*[@class="c-pager__item"]/../following-sibling::*//@href').extract_first()
if next_page:
yield scrapy.Request(response.urljoin(next_page))
except AttributeError:
logging.error("The element didn't exist")
你能帮我了解问题是什么以及如何解决它吗?
HUX布斯
相关分类