所以我对 scrapy 完全是新手,并试图学习 scrapy。
https://www.killertools.com/Dent-Removal-Aluminum-Steel_c_11.html 对于初学者,如果有超过一页的产品可供浏览,我想从两个页面中的第一类别中的所有产品中删除项目名称。
这就是我得到的并且有效:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'killertools'
start_urls = ['https://www.killertools.com/Dent-Removal-Aluminum-Steel_c_11.html',
]
def parse(self, response):
for item in response.css('div.name'):
yield {'Name': item.xpath('a/text()').get()}
next_page = response.css('div.paging a:nth-child(4)::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
但我想进入每个产品链接并提取项目描述并将它们作为描述放入词汇表中。我该如何去做呢?
我尝试过这样的事情:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'killertools'
start_urls = ['https://www.killertools.com/Dent-Removal-Aluminum-Steel_c_11.html',
]
def parse(self, response):
for item in response.css('div.name'):
yield {'Name': item.xpath('a/text()').get()}
detail_page = response.css('div.name a::attr("href")').get()
if detail_page is not None:
yield response.follow(detail_page)
for detail in response.css('div.item'):
yield {'Description': detail.xpath('p/strong/text').get()}
next_page = response.css('div.paging a:nth-child(4)::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
但它做了一些奇怪的事情,在我的水平上我无法真正理解这些事情。
相关分类