使用 lxml 库,拥有这个 doc xml 文件,我想剥离一些标签并重命名它们:doc.xml
<html>
<body>
<h5>Fruits</h5>
<div>This is some <span attr="foo">Text</span>.</div>
<div>Some <span>more</span> text.</div>
<h5>Vegetables</h5>
<div>Yet another line <span attr="bar">of</span> text.</div>
<div>This span will get <span attr="foo">removed</span> as well.</div>
<div>Nested elements <span attr="foo">will <b>be</b> left</span> alone.</div>
<div>Unless <span attr="foo">they <span attr="foo">also</span> match</span>.</div>
</body>
</html>
而不是 html,body 将所有内容包装在“p tag”中,而不是让 h5 和每个 div 使用 lxml 将所有内容作为示例包装如下:我的问题是如何从一种格式以下面的格式包装所有内容?
<p>
<h5 title='Fruits'>
<div>This is some <span attr='foo'>Test</span>.</div>
<div>Some<span>more</span>text.</div>
</h5>
<h5 title='Vegetables'>
<div>Yet another line <span attr='bar'>of</span>text.</div>
....
</h5>
</p>
使用 lxml,剥离标签:
tree = etree.tostring(doc.xml)
tree1 = lxml.html.fromstring(tree)
etree.strip_tags(tree1, 'body')
有人对此有任何想法吗?
皈依舞
至尊宝的传说
相关分类