在pycharm中设置一个启动文件,每次运行就不用在CentOS中写命令了
粘贴到这里
复制User-agent
还是有报错:返回403
这下再执行豆瓣项目
安 装
制作make文件
进入python安装目录 输入
记录一下最下面的目录
还是不行,需要重新编译安装python
再 运行
安装成功了
安装命 令
最后报错了,需安装_sqlite3
启动后的信息
运行Scripy项目
scrapy crawl spider名
settings.py中要设置USER_AGENT,其值可以通过浏览器的F12中查找获取(Request Headers)
安装sqlite
yum -y install sqlite*
重新编译python
修改setttings文件
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3627.0 Safari/537.36'
命令行启动spider
scrapy crawl douban_spider
运行爬虫
在spider文件夹中
scrapy crawl douban_spider
printf--》print
32那个不改成0.5,也不去掉#
class DoubanSpiderSpider(scrapy.Spider):
name = 'douban_spider'
allowed_domains = ['movie.douban.com']
start_urls = ['http://movie.douban.com/top250']
# 默认的解析方法
def parse(self, response):
movie_list = response.xpath("//div[@class='article']//ol[@class='grid_view']/li")
for i_item in movie_list:
douban_item = DoubanItem()
douban_item['serial_number'] = i_item.xpath(".//div[@class='item']//em/text()").extract_first()
douban_item['movie_name'] = i_item.xpath(
".//div[@class='info']/div[@class='hd']/a/span[1]/text()").extract_first()
content = i_item.xpath(".//div[@class='info']//div[@class='bd']/p[1]/text()").extract()
for i_content in content:
content_s = ''.join(i_content.split())
douban_item['introduce'] = content_s
douban_item['star'] = i_item.xpath(".//span[@class='rating_num']/text()").extract_first()
douban_item['evaluate'] = i_item.xpath(".//div[@class='star']/span[4]/text()").extract_first()
douban_item['describe'] = i_item.xpath(".//p[@class='quote']//span/text()").extract_first()
yield douban_item
# 解析下一页规则,取的后页的xpath
next_link = response.xpath("//span[@class='next']/link/@href").extract()
if next_link:
next_link = next_link[0]
yield scrapy.Request('http://movie.douban.com/top250' + next_link, callback=self.parse)
# Crawl responsibly by identifying yourself (and your website) on the user-agent USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
./configure --prefix='你的python安装目录' --with-ssl
douban_spider.py修改入口url
scrapy crawl douban_spider
yum -y install sqlite*
重新编译python3
修改settings.py的USER_AGENT字段
scrapy crawl douban_spider.py 执行爬虫