如何抓取没有引用或名称属性的项目?

我真的是scrapy scrape的新手,我尝试了基本代码,但这有点独特,我在这里尝试了不同的方法。我怎样才能在这里获得喜欢、喜欢和信息丰富的数量 https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/


这是我的代码


<ul class="dark_postrating_outputlist">

<li>

<i class="fa fa-info-circle"></i> Informative x <strong>1</strong>

</li>

<li>

<i class="fa fa-thumbs-o-up"></i> Like x <strong>1</strong>

</li>

</ul>

我想得到里面的特定项目我试过这个


response.css('ul.dark_postrating_outputlist i.fa.fa-thumbs-o-up strong::text').extract_first()

但它不起作用,请问有什么想法吗?谢谢


杨魅力
浏览 170回答 3
3回答

明月笑刀无情

尝试以下操作以获取所需的内容:import scrapyclass TeslamotorsclubSpider(scrapy.Spider):&nbsp; &nbsp; name = "teslamotorsclub"&nbsp; &nbsp; start_urls = ["https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/"]&nbsp; &nbsp; def parse(self, response):&nbsp; &nbsp; &nbsp; &nbsp; for item in response.css("[id^='fc-post-']"):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; author = item.css(".author::text").get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; like = item.css(".fa-thumbs-o-up + strong::text").get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; love = item.css(".fa-heart-o + strong::text").get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; informative = item.css(".fa-info-circle + strong::text").get()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield {"author":author,"like":like,"love":love,"informative":informative}部分输出:{'author': 'Unpilot', 'like': '1', 'love': '4', 'informative': '1'}{'author': 'UnknownSoldier', 'like': '7', 'love': '2', 'informative': '1'}{'author': 'SpaceCash', 'like': '2', 'love': '15', 'informative': '2'}{'author': 'gene', 'like': '45', 'love': '18', 'informative': '1'}{'author': 'engle', 'like': '31', 'love': '5', 'informative': '15'}{'author': 'Unpilot', 'like': '11', 'love': '3', 'informative': None}{'author': 'SebastianR', 'like': '3', 'love': None, 'informative': None}{'author': 'Buckminster', 'like': '1', 'love': '4', 'informative': None}

四季花海

您可以添加一些更具体的选择器来分隔“喜欢”和“信息量”数据。检查这个例子:>>> txt = """<ul class="dark_postrating_outputlist">...&nbsp; <li>...&nbsp; <i class="fa fa-info-circle"></i> Informative x <strong>1</strong>...&nbsp; </li>...&nbsp; <li>&nbsp;...&nbsp; <i class="fa fa-thumbs-o-up"></i> Like x <strong>2</strong>...&nbsp; </li>...&nbsp; </ul>""">>> from scrapy import Selector>>> sel = Selector(text=txt)>>> sel.css('ul.dark_postrating_outputlist li:contains("Informative") strong::text').get()u'1'>>> sel.css('ul.dark_postrating_outputlist li:contains("Like") strong::text').get()u'2'在这里您可以单独获取您的号码。

慕侠2389804

使用 XPath 而不是 CSS:response.xpath('//ul[@class="dark_postrating_outputlist"]/li[//i[contains()"fa-thumbs-o-up"]]/strong/text()').get()
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python