删除某个子节点后的子节点

我想删除<hr/>元素(包括<hr/>)下方的元素内的所有节点(包括文本)。


例如,这个:


<td class="one">

    Some text

    <a href="page1.html"/>

    <br/>

    Some more text

    <br/>

    <a href="page2.html"/>

    <hr/>

    Bottom text

    <br/>

    <a href="page3.html"/>

</td>

应该变成:


<td class="one">

    Some text

    <a href="page1.html"/>

    <br/>

    Some more text

    <br/>

    <a href="page2.html"/>

</td>

我有这个 XPath 来查找下面的所有元素<hr/>:


./node()[ preceding-sibling::hr[not(following-sibling::hr)] ]

但我不知道如何删除这些元素。我试图这样做:


xp = './node()[ preceding-sibling::hr[not(following-sibling::hr)] ]'

els = self.xpath(xp, td_el)

for el in els:

    el.getparent().remove(el)

但它不适用于文本节点。


最好的方法是什么?谢谢。


胡说叔叔
浏览 186回答 1
1回答

阿晨1998

尝试使用以下代码删除节点:from lxml import etree, htmlsource = """<td class="one">&nbsp; &nbsp; Some text&nbsp; &nbsp; <a href="page1.html"/>&nbsp; &nbsp; <br/>&nbsp; &nbsp; Some more text&nbsp; &nbsp; <br/>&nbsp; &nbsp; <a href="page2.html"/>&nbsp; &nbsp; <hr/>&nbsp; &nbsp; Bottom text&nbsp; &nbsp; <br/>&nbsp; &nbsp; <a href="page3.html"/></td>"""html = html.fromstring(source)parent = html.xpath('//td')[0]redundant = html.xpath('//hr/preceding-sibling::*[1]/following-sibling::*')for node in redundant:&nbsp; &nbsp; parent.remove(node)print(etree.tostring(parent))输出<td class="one">&nbsp; &nbsp; Some text&nbsp; &nbsp; <a href="page1.html"/>&nbsp; &nbsp; <br/>&nbsp; &nbsp; Some more text&nbsp; &nbsp; <br/>&nbsp; &nbsp; <a href="page2.html"/></td>
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python