在scrapy蜘蛛中创建类实例变量

首页课程实战体系课手记专栏慕课教程

在scrapy蜘蛛中创建类实例变量

我是python的新手。我想variable_1, variable_2在scrapy蜘蛛类中创建我自己的类实例。以下代码运行良好。

class SpiderTest1(scrapy.Spider):

name = 'main run'

url = 'url example' # this class variable working find

variable_1 = 'info_1' # this class variable working find

variable_2 = 'info_2' # this class variable working find

def start_requests(self):

urls = [self.url]

for url in urls:

yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):

print (f'some process with {self.variable_1}')

print (f'some prcesss with {self.variable_2}')

# start run the class

process = CrawlerProcess(get_project_settings())

process.crawl(SpiderTest1())

process.start()

但是我想让它成为类实例变量，这样我就不必在每次运行时修改蜘蛛内部变量的值。我决定创建def __init__(self, url, varialbe_1, variable_2)成scrapy蜘蛛，我希望SpiderTest1(url, variable_1, variable_2)用来运行它。以下是我希望像上面的代码一样产生的新代码，但这效果不佳：

class SpiderTest1(scrapy.Spider):

name = 'main run'

# the following __init__ are new change, but not working fine

def __init__(self, url, variable_1, variable_2):

self.url = url

self.variable_1 = variable_1

self.variable_2 = variable_2

def start_requests(self):

urls = [self.url]

for url in urls:

yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):

print(f'some process with {self.variable_1}')

print(f'some prcesss with {self.variable_2}')

# input values into variables

url = 'url example'

variable_1 = 'info_1'

variable_2 = 'info_2'

# start run the class

process = CrawlerProcess(get_project_settings())

process.crawl(SpiderTest1(url, variable_1, variable_2)) #it seem this code doesn't work

process.start()

结果：

TypeError: __init__() missing 3 required positional arguments: 'url', 'variable_1', and 'variable_2'

感谢任何人都可以告诉如何实现它。

慕后森

浏览 204回答 2

2回答

潇湘沐

根据Common Practices和API 文档，您应该crawl像这样调用方法将参数传递给蜘蛛构造函数：process = CrawlerProcess(get_project_settings())   process.crawl(SpiderTest1, url, variable_1, variable_2)process.start()更新：文档还提到了这种运行蜘蛛的形式：process.crawl('followall', domain='scrapinghub.com')在本例中，'followall'是项目中蜘蛛的名称（即name蜘蛛类的属性值）。在您定义蜘蛛的特定情况下，如下所示：class SpiderTest1(scrapy.Spider):    name = 'main run'    ...您将使用此代码使用蜘蛛名称运行您的蜘蛛：process = CrawlerProcess(get_project_settings())   process.crawl('main run', url, variable_1, variable_2)process.start()

0 0

繁星coding

谢谢，我的代码按照你的方式工作正常。但我发现事情与常见做法略有不同这是我们的代码：process.crawl(SpiderTest1, url, variable_1, variable_2)这是来自Common Practicesprocess.crawl('followall', domain='scrapinghub.com')您建议的第一个变量是使用 class's name SpiderTest1，但另一个使用 string'followall'什么是'followall'指什么？它指的是目录：testspiders/testspiders/spiders/followall.py或只是类的变量name = 'followall'下followall.py我问它是因为我仍然很困惑什么时候应该打电话string或class name在scrapy蜘蛛中。谢谢。

0 0

随时随地看视频慕课网APP