使用 ElementTree 提取 <content:encoded>

我目前正在尝试弄清楚如何在 Python 中使用 ElementTree 提取 <content:encoded> 和 </content:encoded> 之间的内容。下面附上的是我目前用来解决这个问题的 Python 代码。我目前无法提取内容。我想提取“我喜欢打篮球和吃东西”。谁能帮我看看我的代码有什么问题?


xml = '''<item>

        <title>Defensive Moves</title>

        <link>www.timmy256.wordpress.com</link>

        <pubDate></pubDate>

        <dc:creator><![CDATA[jross]]></dc:creator>

        <guid isPermaLink="false"> www.timmy256.wordpress.com </guid>   

        <description></description>

        <content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>

        </item>'''


import xml.etree.ElementTree as ET


tree = ET.parse(xml)

root = tree.getroot()

data = root.iter("content:encoded").text


小唯快跑啊
浏览 120回答 1
1回答

素胚勾勒不出你

另一种方法。from simplified_scrapy import SimplifiedDocxml = '''<item>&nbsp; &nbsp; &nbsp; &nbsp; <title>Defensive Moves</title>&nbsp; &nbsp; &nbsp; &nbsp; <link>www.timmy256.wordpress.com</link>&nbsp; &nbsp; &nbsp; &nbsp; <pubDate></pubDate>&nbsp; &nbsp; &nbsp; &nbsp; <dc:creator><![CDATA[jross]]></dc:creator>&nbsp; &nbsp; &nbsp; &nbsp; <guid isPermaLink="false"> www.timmy256.wordpress.com </guid>&nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; <description></description>&nbsp; &nbsp; &nbsp; &nbsp; <content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>&nbsp; &nbsp; &nbsp; &nbsp; </item>'''doc = SimplifiedDoc(xml)print(doc.select('item>content:encoded>html()')[9:-3])结果:I love playing basketball and eating food.这里有更多例子:https ://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python