scrapy crawl douban_spider 出错

来源:2-6 spider文件的编写(1)

Lxp1245121592

2019-08-20 21:01

lxp:douban lixiaopeng$ scrapy crawl douban_spider

2019-08-20 20:57:19 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: douban)

2019-08-20 20:57:19 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 3.7.0b1 (v3.7.0b1:9561d7f501, Jan 30 2018, 16:11:47) - [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1c  28 May 2019), cryptography 2.7, Platform Darwin-18.7.0-x86_64-i386-64bit

2019-08-20 20:57:19 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'douban', 'DOWNLOAD_DELAY': 0.5, 'NEWSPIDER_MODULE': 'douban.spiders', 'SPIDER_MODULES': ['douban.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}

2019-08-20 20:57:20 [scrapy.extensions.telnet] INFO: Telnet Password: 24a1717313d85e58

2019-08-20 20:57:20 [scrapy.middleware] INFO: Enabled extensions:

['scrapy.extensions.corestats.CoreStats',

 'scrapy.extensions.telnet.TelnetConsole',

 'scrapy.extensions.memusage.MemoryUsage',

 'scrapy.extensions.logstats.LogStats']

Unhandled error in Deferred:

2019-08-20 20:57:20 [twisted] CRITICAL: Unhandled error in Deferred:


Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 184, in crawl

    return self._crawl(crawler, *args, **kwargs)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 188, in _crawl

    d = crawler.crawl(*args, **kwargs)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator

    return _cancellableInlineCallbacks(gen)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks

    _inlineCallbacks(None, g, status)

--- <exception caught here> ---

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks

    result = g.send(result)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 86, in crawl

    self.engine = self._create_engine()

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 111, in _create_engine

    return ExecutionEngine(self, lambda _: self.stop())

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/engine.py", line 69, in __init__

    self.downloader = downloader_cls(crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/downloader/__init__.py", line 86, in __init__

    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler

    return cls.from_settings(crawler.settings, crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings

    mwcls = load_object(clspath)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/utils/misc.py", line 46, in load_object

    mod = import_module(module)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module

    return _bootstrap._gcd_import(name[level:], package, level)

  File "<frozen importlib._bootstrap>", line 994, in _gcd_import

    

  File "<frozen importlib._bootstrap>", line 971, in _find_and_load

    

  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked

    

  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

    

  File "<frozen importlib._bootstrap_external>", line 723, in exec_module

    

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

    

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/downloadermiddlewares/httpproxy.py", line 5, in <module>

    from urllib2 import _parse_proxy

builtins.SyntaxError: invalid syntax (urllib2.py, line 220)


2019-08-20 20:57:20 [twisted] CRITICAL: 

Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks

    result = g.send(result)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 86, in crawl

    self.engine = self._create_engine()

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 111, in _create_engine

    return ExecutionEngine(self, lambda _: self.stop())

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/engine.py", line 69, in __init__

    self.downloader = downloader_cls(crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/downloader/__init__.py", line 86, in __init__

    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler

    return cls.from_settings(crawler.settings, crawler)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings

    mwcls = load_object(clspath)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/utils/misc.py", line 46, in load_object

    mod = import_module(module)

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module

    return _bootstrap._gcd_import(name[level:], package, level)

  File "<frozen importlib._bootstrap>", line 994, in _gcd_import

  File "<frozen importlib._bootstrap>", line 971, in _find_and_load

  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked

  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

  File "<frozen importlib._bootstrap_external>", line 723, in exec_module

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/downloadermiddlewares/httpproxy.py", line 5, in <module>

    from urllib2 import _parse_proxy

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib2.py", line 220

    raise AttributeError, attr

                        ^

SyntaxError: invalid syntax


写回答 关注

1回答

  • 果冻gg
    2019-09-07 11:17:49

    python3.3以后不支持urllib2,按住ctrl键点击`File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/downloadermiddlewares/httpproxy.py", line 5, in <module>`行进入httpproxy.py,删除第5行和urllib2有关的语句保存就好了

Python最火爬虫框架Scrapy入门与实践

做为爬虫工程师Python Scrapy主流爬虫框架你必须要会!

67418 学习 · 223 问题

查看课程

相似问题