猿问

剥离lxml中的单个元素

我需要在保留其数据的同时删除 XML 元素。lxml 函数strip_tags确实删除了元素,但它以递归方式工作,我想去除单个元素。

我尝试使用这篇文章的答案,但remove删除了整个元素。

xml="""

<groceries>

  One <fruit state="rotten">apple</fruit> a day keeps the doctor away.

  This <fruit state="fresh">pear</fruit> is fresh.

</groceries>

"""


tree=ET.fromstring(xml)


for bad in tree.xpath("//fruit[@state='rotten']"):

    bad.getparent().remove(bad)


print (ET.tostring(tree, pretty_print=True))

我想得到


<groceries>

    One apple a day keeps the doctor away.

    This <fruit state="fresh">pear</fruit> is fresh.

</groceries>\n'

我明白了


<groceries>

    This <fruit state="fresh">pear</fruit> is fresh.

</groceries>\n'

我尝试使用strip_tags:


for bad in tree.xpath("//fruit[@state='rotten']"):

    ET.strip_tags(bad.getparent(), bad.tag)


<groceries>

    One apple a day keeps the doctor away.

    This pear is fresh.

</groceries>

但这会剥离一切,我只想用state='rotten'.


Cats萌萌
浏览 111回答 1
1回答

ibeautiful

也许其他人有更好的主意,但这是一种可能的解决方法:bad = tree.xpath(".//fruit[@state='rotten']")[0] #for simplicity, I didn't bother with a for loop in this casetxt = bad.text+bad.tail # collect the text content of bad; strangely enough it's not just 'apple'bad.getparent().text += txt # add the collected text to the parent's existing texttree.remove(bad) # this gets rid only of this specific 'bad'print(etree.tostring(tree).decode())输出:<groceries>&nbsp; One apple a day keeps the doctor away.&nbsp; This <fruit state="fresh">pear</fruit> is fresh.</groceries>
随时随地看视频慕课网APP

相关分类

Python
我要回答