我的Scrapy有问题,下面的蜘蛛调用后返回一个空项scrapy crawl panini,蜘蛛的解析代码name是:
class PaniniSpider(scrapy.Spider):
name = "panini"
start_url = ["http://comics.panini.it/store/pub_ita_it/magazines.html"]
# products-list
def parse(self, response):
# Get all the <a> tags
item = ComicscraperItem()
item['title'] = response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/text()').extract()
item['link'] = response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/@href').extract()
yield item
这是抓取返回的内容:
2019-08-03 21:10:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-08-03 21:10:08 [scrapy.core.engine] INFO: Spider opened
2019-08-03 21:10:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-08-03 21:10:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-08-03 21:10:08 [scrapy.core.engine] INFO: Closing spider (finished)
2019-08-03 21:10:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.010107,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 8, 3, 19, 10, 8, 112158),
'log_count/INFO': 10,
'start_time': datetime.datetime(2019, 8, 3, 19, 10, 8, 102051)}
2019-08-03 21:10:08 [scrapy.core.engine] INFO: Spider closed (finished)
response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/text()').extract() 如果我在使用所选站点加载 shell 后在终端中写入,它将返回正确的结果!
我认为问题出在连接的 xpath 中,但我不知道在哪里!
浮云间
相关分类