猿问

Scrapy 不解析项目

我正在尝试使用 pegination 抓取网页,但回电不解析项目,任何帮助将不胜感激....这里是代码


# -*- coding: utf-8 -*-

import scrapy

from ..items import EscrotsItem


class Escorts(scrapy.Spider):

    name = 'escorts'

    allowed_domains = ['www.escortsandbabes.com.au']

    start_urls = ['https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/']


    def parse_links(self, response):

        for i in response.css('.btn.btn-default.btn-block::attr(href)').extract()[2:]:

            yield scrapy.Request(url=response.urljoin(i),callback=self.parse)

        NextPage = response.css('.page.next-page::attr(href)').extract_first()

        if NextPage:

            yield scrapy.Request(

                url=response.urljoin(NextPage),

                callback=self.parse_links)


    def parse(self, response):

        for x in response.xpath('//div[@class="advertiser-profile"]'):

            item = EscrotsItem()

            item['Name'] = x.css('.advertiser-names--display-name::text').extract_first()

            item['Username'] = x.css('.advertiser-names--username::text').extract_first()

            item['Phone'] = x.css('.contact-number::text').extract_first()

            yield item


拉风的咖菲猫
浏览 145回答 1
1回答

温温酱

您的代码调用 urlsstart_urls并parse运行。由于没有任何div.advertiser-profile元素,它确实应该在没有任何结果的情况下关闭。所以你的parse_links函数根本没有被调用。更改函数名称:import scrapyclass Escorts(scrapy.Spider):&nbsp; &nbsp; name = 'escorts'&nbsp; &nbsp; allowed_domains = ['escortsandbabes.com.au']&nbsp; &nbsp; start_urls = ['https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/']&nbsp; &nbsp; def parse(self, response):&nbsp; &nbsp; &nbsp; &nbsp; for i in response.css('.btn.btn-default.btn-block::attr(href)').extract()[2:]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield scrapy.Request(response.urljoin(i), self.parse_links)&nbsp; &nbsp; &nbsp; &nbsp; next_page = response.css('.page.next-page::attr(href)').get()&nbsp; &nbsp; &nbsp; &nbsp; if next_page:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield scrapy.Request(response.urljoin(next_page))&nbsp; &nbsp; def parse_links(self, response):&nbsp; &nbsp; &nbsp; &nbsp; for x in response.xpath('//div[@class="advertiser-profile"]'):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item = {}&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item['Name'] = x.css('.advertiser-names--display-name::text').get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item['Username'] = x.css('.advertiser-names--username::text').get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item['Phone'] = x.css('.contact-number::text').get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield item我来自scrapy shell的日志:In [1]: fetch("https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/")2019-03-29 15:22:56 [scrapy.core.engine] INFO: Spider opened2019-03-29 15:23:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/> (referer: None, latency: 2.48 s)In [2]: response.css('.page.next-page::attr(href)').get()Out[2]: u'/Directory/ACT/Canberra/2600/Any/All/?p=2'
随时随地看视频慕课网APP

相关分类

Python
我要回答