从数据库 scrapy 中检索数据

首页课程实战体系课手记专栏慕课教程

从数据库 scrapy 中检索数据

在 scrapy 中，我试图从数据库中检索数据，这些数据被蜘蛛抓取并添加到 pipelines.py 中的数据库中。我想让这个数据在另一个蜘蛛中使用。具体来说，我想从数据库中检索链接并在 start_request 函数中使用它。我知道这里也解释了这个问题Scrapy: Get Start_Urls from Database by Pipeline我试着通过这个例子来做，但不幸的是它不起作用，我不不知道为什么，但我知道我在某个地方犯了错误。

piplines.py

import sqlite3

class HeurekaScraperPipeline:

def __init__(self):

self.create_connection()

self.create_table()

def create_connection(self):

self.conn = sqlite3.connect('shops.db')

self.curr = self.conn.cursor()

def create_table(self):

self.curr.execute("""DROP TABLE IF EXISTS shops_tb""")

self.curr.execute("""create table shops_tb(

product_name text,

shop_name text,

price text,

link text

)""")

def process_item(self, item, spider):

self.store_db(item)

return item

def store_db(self, item):

self.curr.execute("""insert into shops_tb values (?, ?, ?, ?)""",(

item['product_name'],

item['shop_name'],

item['price'],

item['link'],

))

self.conn.commit()

spider

class Shops_spider(scrapy.Spider):

name = 'shops_scraper'

custom_settings = {'DOWNLOAD_DELAY': 1}

def start_requests(self):

db_cursor = HeurekaScraperPipeline().curr

db_cursor.execute("SELECT * FROM shops_tb")

links = db_cursor.fetchall()

for link in links:

url = link[3]

print(url)

yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):

url = response.request.url

print('********************************'+url+'************************')

预先感谢您的帮助。

动漫人物

浏览 132回答 1

1回答

凤凰求蛊

管道用于处理项目。如果你想从数据库中读取一些东西，打开连接并在start_request. 根据文档：在一个项目被蜘蛛抓取后，它被发送到项目管道，它通过几个顺序执行的组件来处理它。为什么不在 start_request 中打开 DB 连接？def start_requests(self):        self.conn = sqlite3.connect('shops.db')        self.curr = self.conn.cursor()        self.curr.execute("SELECT * FROM shops_tb")        links = self.curr.fetchall()        # rest of the code

0 0

随时随地看视频慕课网APP

相关分类

Python