如何解决scarpy-redis空跑问题?

scrapy-redis框架中,reids存储的xxx:requests已经爬取完毕,但程序仍然一直运行,如何自动停止程序,而不是一直在空跑?
2017-07-0309:17:06[scrapy.extensions.logstats]INFO:Crawled0pages(at0pages/min),scraped0items(at0items/min)
2017-07-0309:18:06[scrapy.extensions.logstats]INFO:Crawled0pages(at0pages/min),scraped0items(at0items/min)
[仅供参考]可以通过engine.close_spider(spider,'reason')来停止程序的运行。
#schedluer.py
defnext_request(self):
block_pop_timeout=self.idle_before_close
request=self.queue.pop(block_pop_timeout)
ifrequestandself.stats:
self.stats.inc_value('scheduler/dequeued/redis',spider=self.spider)
ifrequestisNone:
self.spider.crawler.engine.close_spider(self.spider,'queueisempty')
returnrequest
#当然也可以在scrapy_redis中spiders.py模块
defnext_requests(self):
"""Returnsarequesttobescheduledornone."""
use_set=self.settings.getbool('REDIS_START_URLS_AS_SET',defaults.START_URLS_AS_SET)
fetch_one=self.server.spopifuse_setelseself.server.lpop
#XXX:Doweneedtouseatimeouthere?
found=0
#TODO:Useredispipelineexecution.
whilefounddata=fetch_one(self.redis_key)
ifnotdata:
#Queueempty.
print('+++++queueisempty')
self.crawler.engine.close_spider(self.spider,'queueisempty')
break
req=self.make_request_from_data(data)
ifreq:
yieldreq
found+=1
else:
self.logger.debug("Requestnotmadefromdata:%r",data)
iffound:
self.logger.debug("Read%srequestsfrom'%s'",found,self.redis_key)
还有一个问题不明白:当通过engine.close_spider(spider,'reason')来关闭spider时,会出现几个错误之后才能关闭。
#正常关闭
2017-07-0318:02:38[scrapy.core.engine]INFO:Closingspider(queueisempty)
2017-07-0318:02:38[scrapy.statscollectors]INFO:DumpingScrapystats:
{'finish_reason':'queueisempty',
'finish_time':datetime.datetime(2017,7,3,10,2,38,616021),
'log_count/INFO':8,
'start_time':datetime.datetime(2017,7,3,10,2,38,600382)}
2017-07-0318:02:38[scrapy.core.engine]INFO:Spiderclosed(queueisempty)
#之后还会出现几个错误才关闭spider,难道spider刚启动时会启动多个线程一起抓取,
#然后其中一个线程关闭了spider,其他线程就找不到spider才会报错!
UnhandledError
Traceback(mostrecentcalllast):
File"D:/papp/project/launch.py",line37,in
process.start()
File"D:\ProgramFiles\python3\lib\site-packages\scrapy\crawler.py",line285,instart
reactor.run(installSignalHandlers=False)#blockingcall
File"D:\ProgramFiles\python3\lib\site-packages\twisted\internet\base.py",line1243,inrun
self.mainLoop()
File"D:\ProgramFiles\python3\lib\site-packages\twisted\internet\base.py",line1252,inmainLoop
self.runUntilCurrent()
------
File"D:\ProgramFiles\python3\lib\site-packages\twisted\internet\base.py",line878,inrunUntilCurrent
call.func(*call.args,**call.kw)
File"D:\ProgramFiles\python3\lib\site-packages\scrapy\utils\reactor.py",line41,in__call__
returnself._func(*self._a,**self._kw)
File"D:\ProgramFiles\python3\lib\site-packages\scrapy\core\engine.py",line137,in_next_request
ifself.spider_is_idle(spider)andslot.close_if_idle:
File"D:\ProgramFiles\python3\lib\site-packages\scrapy\core\engine.py",line189,inspider_is_idle
ifself.slot.start_requestsisnotNone:
builtins.AttributeError:'NoneType'objecthasnoattribute'start_requests'
天涯尽头无女友
浏览 655回答 2
2回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

JavaScript