连接的 Xpath 返回空项

首页课程实战体系课手记专栏慕课教程

连接的 Xpath 返回空项

我的Scrapy有问题，下面的蜘蛛调用后返回一个空项scrapy crawl panini，蜘蛛的解析代码name是：

class PaniniSpider(scrapy.Spider):

name = "panini"

start_url = ["http://comics.panini.it/store/pub_ita_it/magazines.html"]

# products-list

def parse(self, response):

# Get all the <a> tags

item = ComicscraperItem()

item['title'] = response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/text()').extract()

item['link'] = response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/@href').extract()

yield item

这是抓取返回的内容：

2019-08-03 21:10:08 [scrapy.middleware] INFO: Enabled item pipelines:

[]

2019-08-03 21:10:08 [scrapy.core.engine] INFO: Spider opened

2019-08-03 21:10:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2019-08-03 21:10:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023

2019-08-03 21:10:08 [scrapy.core.engine] INFO: Closing spider (finished)

2019-08-03 21:10:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

{'elapsed_time_seconds': 0.010107,

'finish_reason': 'finished',

'finish_time': datetime.datetime(2019, 8, 3, 19, 10, 8, 112158),

'log_count/INFO': 10,

'start_time': datetime.datetime(2019, 8, 3, 19, 10, 8, 102051)}

2019-08-03 21:10:08 [scrapy.core.engine] INFO: Spider closed (finished)

response.xpath('//*[@id="products-list"]/div/div[2]/h3/a/text()').extract() 如果我在使用所选站点加载 shell 后在终端中写入，它将返回正确的结果！

我认为问题出在连接的 xpath 中，但我不知道在哪里！

慕侠2389804

浏览 168回答 1

1回答

浮云间

尝试使用可用的属性，例如class或id在抓取时，它会让您的生活更轻松。尝试使用以下测试代码：for sel in response.xpath("//div[@class='list-group']//h3/a"):    print(sel.xpath('./text()').extract_first().strip(''))    print(sel.xpath('./@href').extract_first())编辑：上述代码的更好版本：for sel in response.xpath("//h3[@class='product-name']/a"):    print(sel.xpath('./@title').extract_first())    print(sel.xpath('./@href').extract_first())

0 0

随时随地看视频慕课网APP