猿问

Scrapy 响应统一的空白行使得无法格式化响应输出

我想删除 [ ] 括号 scrapy 添加到它的所有输出中,为此您只需在 xpath 语句的末尾添加 [0] ,如下所示:


'a[@class="question-hyperlink"]/text()').extract()[0]

这在某些情况下解决了 [] 问题,但在其他情况下,scrapy 将每第二行输出返回为空白,因此在使用 [0] 时它到达第二行时出现错误:


Index error: list index out of range

如何防止scrapy创建空行?这似乎是一个常见问题,但每个人在导出为 CSV 时都会遇到这个问题,而对我来说,在导出为 CSV 之前,它是带有scrapy 响应的。


项目.py:


import scrapy

from scrapy.item import Item, Field



class QuestionItem(Item):

    title = Field()

    url = Field()


class PopularityItem(Item):

    votes = Field()

    answers = Field()

    views = Field()



class ModifiedItem(Item):

    lastModified = Field()

    modName = Field()

不会每隔一行输出为空白并因此与 [0] 一起使用的蜘蛛:


from scrapy import Spider

from scrapy.selector import Selector


from stack.items import QuestionItem


class QuestionSpider(Spider):

    name = "questions"

    allowed_domains = ["stackoverflow.com"]

    start_urls = [

        "http://stackoverflow.com/questions?pagesize=50&sort=newest",

    ]


    def parse(self, response):

        questions = Selector(response).xpath('//div[@class="summary"]/h3')


        for question in questions:

            item = QuestionItem()

            item['title'] = question.xpath(

                'a[@class="question-hyperlink"]/text()').extract()[0]

            item['url'] = question.xpath(

                'a[@class="question-hyperlink"]/@href').extract()[0]

            yield item

每隔一行输出为空白的蜘蛛:


from scrapy import Spider

from scrapy.selector import Selector


from stack.items import PopularityItem



class PopularitySpider(Spider):

    name = "popularity"

    allowed_domains = ["stackoverflow.com"]

    start_urls = [

        "https://stackoverflow.com/",

    ]


    def parse(self, response):

        popularity = response.xpath('//div[contains(@class, "question-summary narrow")]/div')


        for poppart in popularity:



人到中年有点甜
浏览 176回答 1
1回答
随时随地看视频慕课网APP

相关分类

Python
我要回答