在谷歌的扩展应用中直接搜索xpath就可以了
你好,请问你这边解决了嘛
"C:\Program Files\Python38\python.exe" C:/Users/Administrator/Desktop/study_python/douban/douban/main.py
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/study_python/douban/douban/main.py", line 2, in <module>
cmdline.execute('scrapy crawl douban_sprider'.split())
File "C:\Program Files\Python38\lib\site-packages\scrapy\cmdline.py", line 112, in execute
settings = get_project_settings()
File "C:\Program Files\Python38\lib\site-packages\scrapy\utils\project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "C:\Program Files\Python38\lib\site-packages\scrapy\settings\__init__.py", line 287, in setmodule
module = import_module(module)
File "C:\Program Files\Python38\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'douban.settings'
Process finished with exit code 1
应该是个crx扩展,可以自行搜索一下。
在元素标签ctrl F 可以直接搜索xpath语法
对呢,确实是只取了一行,因为两行中间有个<br>换行,xpath提取之后每一行是一个元素为2的列表,但是视频循环一层默认就取了提取列表最后一项导致取得是后面一行,我加了一层循环好了,你可以试试
content = i_item.xpath(".//div[@class='info']//div[@class='bd']/p[1]/text()").extract() for i_content in content: print(i_content) for i in i_content: content_s = "".join(i.split('\n')) douban_item['introduce'] = content_s print(douban_item['introduce'])
浏览器里打开F12选中那一行右键copy,然后copy xpath也可以
谷歌浏览器插件:XPath Helper
已解决,使用lxml的etree配合转码可以解决。
目前转码后发现,爬去的返回信息是JS和data,网页是动态生成的,这个怎么爬取呢
目标网页:
https://blog.csdn.net/yuhezheg/article/details/104404887
运行的目录不对
scrapy没安装成功吧
from ..items import DoubanItem
已解决,导入的时候得这么写。。。附原链接
问题已解决
是cpcharm 控制台的输出Unicode编码如何解决
返回200就说明是正常的啊
content = i_item.xpath('.//div[@class="info"]/div/p[1]/text()').extract() 这样子写试试,xpath不要太依赖别人怎么写,试着自己简便下
会不会你的main.py放错地方了
下载文件的话,是文件的编码不对。数据库的话看是否数据库的编码有问题
xpath helper
可以,加我的微信获取:871994650
XPath Helper
//div[@class='article']//ol//li//div[@class='pic']//a//img//@src
找到问题了,url没有添加top250