在pycharm中设置一个启动文件,每次运行就不用在CentOS中写命令了
粘贴到这里
复制User-agent
还是有报错:返回403
这下再执行豆瓣项目
安 装
制作make文件
进入python安装目录 输入
记录一下最下面的目录
还是不行,需要重新编译安装python
再 运行
安装成功了
安装命 令
最后报错了,需安装_sqlite3
启动后的信息
运行Scripy项目
scrapy crawl spider名
settings.py中要设置USER_AGENT,其值可以通过浏览器的F12中查找获取(Request Headers)
安装sqlite
yum -y install sqlite*
重新编译python
修改setttings文件
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3627.0 Safari/537.36'
命令行启动spider
scrapy crawl douban_spider
运行爬虫
在spider文件夹中
scrapy crawl douban_spider
printf--》print
32那个不改成0.5,也不去掉#
class DoubanSpiderSpider(scrapy.Spider): name = 'douban_spider' allowed_domains = ['movie.douban.com'] start_urls = ['http://movie.douban.com/top250'] # 默认的解析方法 def parse(self, response): movie_list = response.xpath("//div[@class='article']//ol[@class='grid_view']/li") for i_item in movie_list: douban_item = DoubanItem() douban_item['serial_number'] = i_item.xpath(".//div[@class='item']//em/text()").extract_first() douban_item['movie_name'] = i_item.xpath( ".//div[@class='info']/div[@class='hd']/a/span[1]/text()").extract_first() content = i_item.xpath(".//div[@class='info']//div[@class='bd']/p[1]/text()").extract() for i_content in content: content_s = ''.join(i_content.split()) douban_item['introduce'] = content_s douban_item['star'] = i_item.xpath(".//span[@class='rating_num']/text()").extract_first() douban_item['evaluate'] = i_item.xpath(".//div[@class='star']/span[4]/text()").extract_first() douban_item['describe'] = i_item.xpath(".//p[@class='quote']//span/text()").extract_first() yield douban_item # 解析下一页规则,取的后页的xpath next_link = response.xpath("//span[@class='next']/link/@href").extract() if next_link: next_link = next_link[0] yield scrapy.Request('http://movie.douban.com/top250' + next_link, callback=self.parse)
# Crawl responsibly by identifying yourself (and your website) on the user-agent USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
./configure --prefix='你的python安装目录' --with-ssl
douban_spider.py修改入口url
scrapy crawl douban_spider
yum -y install sqlite*
重新编译python3
修改settings.py的USER_AGENT字段
scrapy crawl douban_spider.py 执行爬虫