如何使用 python 遍历 scrapy 中的 XML 子节点？

我想抓取此页面上的评论，但我似乎无法了解如何遍历包含评论的节点的子节点并获取数据点。

这是 hmtl 的一部分：

</a>

</div>

<a href="https://www.picuki.com/profile/alexandera_300">@alexandera_300</a>

</div>

#followforfollowback

</div>

</a>

</div>

<a href="https://www.picuki.com/profile/coxlogan2008">@coxlogan2008</a>

</div>

👏

</div>

我正在使用的 python 代码片段是这样的：

def parse_post(self, response):

img_url = response.meta['img_url']

caption = response.meta['caption']

但是，当我运行它时，我只获得了第n条评论的数据。有人可以帮我吗？我不明白为什么代码不遍历节点。

提前致谢！

不负相思意

浏览 168回答 1

1回答

慕哥6287543

我认为您的问题来自您的“评论”的 xpath。通过仅获取文本，您不会选择节点。以下更改使其对我有用：# the likes & number of comments only have to be taken once, should not be part of the looplikes = response.xpath('.//span[@class="icon-thumbs-up-alt"]/text()').get()num_of_comments = response.xpath('.//span[@id="commentsCount"]/text()').get()comments = response.xpath('//div[@id="commantsPlace"]/*[@class="comment"]')for comment in comments:      comment_user_name = comment.xpath('.//*[@class="comment-user-nickname"]/a/text()').get()    comment_text = comment.xpath('.//*[@class="comment-text"]/text()').get()

随时随地看视频慕课网APP