猿问

求助:scrapy爬取数据失败,反复调试都不成功

目标:爬取某一学习网站上课程信息,前期调试仅获取课程名称爬虫文件:
importscrapyfromxtzx.itemsimportXtzxItem
fromscrapy.httpimportRequest
classLessonSpider(scrapy.Spider):
name='lesson'
allowed_domains=['xuetangx.com']
start_urls=['http://www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about']
'''
defstart_requests(self):
ua={"User-Agent":"Mozilla/5.0(WindowsNT10.0;WOW64;Trident/7.0;rv:11.0)likeGecko"}
yieldRequest("www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about",headers=ua)
'''
defparse(self,response):
item=XtzxItem()
item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()
print(item["title"])
执行日志:
2018-04-2811:08:33[scrapy.utils.log]INFO:Scrapy1.5.0started(bot:xtzx)2018-04-2811:08:33[scrapy.utils.log]INFO:Versions:lxml4.2.1.0,libxml22.9.7,cssselect1.0.3,parsel1.4.0,w3lib1.19.0,Twisted17.9.0,Python3.5.4(v3.5.4:3f56838,Aug82017,02:17:05)[MSCv.190064bit(AMD64)],pyOpenSSL17.5.0(OpenSSL1.1.0h27Mar2018),cryptography2.2.2,PlatformWindows-10-10.0.16299-SP02018-04-2811:08:33[scrapy.crawler]INFO:Overriddensettings:{'SPIDER_MODULES':['xtzx.spiders'],'BOT_NAME':'xtzx','NEWSPIDER_MODULE':'xtzx.spiders','USER_AGENT':'Mozilla/5.0(WindowsNT10.0;WOW64;Trident/7.0;rv:11.0)likeGecko'}2018-04-2811:08:33[scrapy.middleware]INFO:Enabledextensions:['scrapy.extensions.corestats.CoreStats','scrapy.extensions.telnet.TelnetConsole','scrapy.extensions.logstats.LogStats']2018-04-2811:08:34[scrapy.middleware]INFO:Enableddownloadermiddlewares:['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares.useragent.UserAgentMiddleware','scrapy.downloadermiddlewares.retry.RetryMiddleware','scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware','scrapy.downloadermiddlewares.redirect.RedirectMiddleware','scrapy.downloadermiddlewares.cookies.CookiesMiddleware','scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats']2018-04-2811:08:34[scrapy.middleware]INFO:Enabledspidermiddlewares:['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','scrapy.spidermiddlewares.referer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy.spidermiddlewares.depth.DepthMiddleware']2018-04-2811:08:34[scrapy.middleware]INFO:Enableditempipelines:[]2018-04-2811:08:34[scrapy.core.engine]INFO:Spideropened
----------好像从这开始出问题
2018-04-2811:08:34[scrapy.extensions.logstats]INFO:Crawled0pages(at0pages/min),scraped0items(at0items/min)2018-04-2811:08:34[scrapy.extensions.telnet]DEBUG:Telnetconsolelisteningon127.0.0.1:60232018-04-2811:08:34[scrapy.core.engine]DEBUG:Crawled(200)(referer:None)2018-04-2811:08:34[scrapy.core.scraper]ERROR:Spidererrorprocessing(referer:None)Traceback(mostrecentcalllast):File"d:python3.5libsite-packagesparselselector.py",line228,inxpath
**kwargs)
File"srclxmletree.pyx",line1577,inlxml.etree._Element.xpathFile"srclxmlxpath.pxi",line307,inlxml.etree.XPathElementEvaluator.__call__File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultlxml.etree.XPathEvalError:Invalidpredicate
Duringhandlingoftheaboveexception,anotherexceptionoccurred:
Traceback(mostrecentcalllast):File"d:python3.5libsite-packagestwistedinternetdefer.py",line653,in_runCallbacks
current.result=callback(current.result,*args,**kw)
File"E:pythonxtzxxtzxspiderslesson.py",line16,inparse
item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()
File"d:python3.5libsite-packagesscrapyhttpresponsetext.py",line119,inxpath
returnself.selector.xpath(query,**kwargs)
File"d:python3.5libsite-packagesparselselector.py",line232,inxpath
six.reraise(ValueError,ValueError(msg),sys.exc_info()[2])
File"d:python3.5libsite-packagessix.py",line692,inreraise
raisevalue.with_traceback(tb)
File"d:python3.5libsite-packagesparselselector.py",line228,inxpath
**kwargs)
File"srclxmletree.pyx",line1577,inlxml.etree._Element.xpathFile"srclxmlxpath.pxi",line307,inlxml.etree.XPathElementEvaluator.__call__File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultValueError:XPatherror:Invalidpredicatein//div[@class='title_detail'/h3[@class='courseabout_title']/text()2018-04-2811:08:35[scrapy.core.engine]INFO:Closingspider(finished)2018-04-2811:08:35[scrapy.statscollectors]INFO:DumpingScrapystats:{'downloader/request_bytes':301,'downloader/request_count':1,'downloader/request_method_count/GET':1,'downloader/response_bytes':24409,'downloader/response_count':1,'downloader/response_status_count/200':1,'finish_reason':'finished','finish_time':datetime.datetime(2018,4,28,3,8,35,118088),'log_count/DEBUG':2,'log_count/ERROR':1,'log_count/INFO':7,'response_received_count':1,'scheduler/dequeued':1,'scheduler/dequeued/memory':1,'scheduler/enqueued':1,'scheduler/enqueued/memory':1,'spider_exceptions/ValueError':1,'start_time':datetime.datetime(2018,4,28,3,8,34,418003)}2018-04-2811:08:35[scrapy.core.engine]INFO:Spiderclosed(finished)
感觉程序很简单,但是就是不行,其他items都是常规的设置,pipelines里面没有添加新的内容,然后settings里面就修改了一下ROBOTSTXT_OBEY的值网上查了很久这样的错误,都没找到相应的方法,也试过伪装浏览器爬取也没用,自学,没有老师,完全没辙了,求助各位.
Helenr
浏览 1065回答 2
2回答

哔哔one

xpath.div[@class='title_detail'这里是否少个]?item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()

杨__羊羊

File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultValueError:XPatherror:Invalidpredicatein//div[@class='title_detail'/h3[@class='courseabout_title']/text()xpath写错了,少了个]
随时随地看视频慕课网APP

相关分类

JavaScript
我要回答