xml.etree.ElementTree获取节点深度

XML:


<?xml version="1.0"?>

<pages>

    <page>

        <url>http://example.com/Labs</url>

        <title>Labs</title>

        <subpages>

            <page>

                <url>http://example.com/Labs/Email</url>

                <title>Email</title>

                <subpages>

                    <page/>

                    <url>http://example.com/Labs/Email/How_to</url>

                    <title>How-To</title>

                </subpages>

            </page>

            <page>

                <url>http://example.com/Labs/Social</url>

                <title>Social</title>

            </page>

        </subpages>

    </page>

    <page>

        <url>http://example.com/Tests</url>

        <title>Tests</title>

        <subpages>

            <page>

                <url>http://example.com/Tests/Email</url>

                <title>Email</title>

                <subpages>

                    <page/>

                    <url>http://example.com/Tests/Email/How_to</url>

                    <title>How-To</title>

                </subpages>

            </page>

            <page>

                <url>http://example.com/Tests/Social</url>

                <title>Social</title>

            </page>

        </subpages>

    </page>

</pages>

代码:


// rexml is the XML string read from a URL

from xml.etree import ElementTree as ET

tree = ET.fromstring(rexml)

for node in tree.iter('page'):

    for url in node.iterfind('url'):

        print url.text

    for title in node.iterfind('title'):

        print title.text.encode("utf-8")

    print '-' * 30

输出:


http://example.com/article1

Article1

------------------------------

http://example.com/article1/subarticle1

SubArticle1

------------------------------

http://example.com/article2

Article2

------------------------------

http://example.com/article3

Article3

------------------------------

Xml表示树状的站点地图结构。


我整天在文档和Google上翻腾,无法弄清楚获取条目的节点深度。


我使用了子容器的计数方法,但是它仅适用于第一个父容器,然后由于无法弄清如何重置而中断。但这可能只是一个骇人听闻的想法。



慕村225694
浏览 404回答 2
2回答

翻翻过去那场雪

lxml.html。import lxml.htmlrexml = ...def depth(node):&nbsp; &nbsp; d = 0&nbsp; &nbsp; while node is not None:&nbsp; &nbsp; &nbsp; &nbsp; d += 1&nbsp; &nbsp; &nbsp; &nbsp; node = node.getparent()&nbsp; &nbsp; return dtree = lxml.html.fromstring(rexml)for node in tree.iter('page'):&nbsp; &nbsp; print depth(node)&nbsp; &nbsp; for url in node.iterfind('url'):&nbsp; &nbsp; &nbsp; &nbsp; print url.text&nbsp; &nbsp; for title in node.iterfind('title'):&nbsp; &nbsp; &nbsp; &nbsp; print title.text.encode("utf-8")&nbsp; &nbsp; print '-' * 30

动漫人物

Python ElementTreeAPI为XML树的深度优先遍历提供了迭代器-不幸的是,这些迭代器没有向调用者提供任何深度信息。但是您可以编写一个深度优先的迭代器,该迭代器还返回每个元素的深度信息:import xml.etree.ElementTree as ETdef depth_iter(element, tag=None):&nbsp; &nbsp; stack = []&nbsp; &nbsp; stack.append(iter([element]))&nbsp; &nbsp; while stack:&nbsp; &nbsp; &nbsp; &nbsp; e = next(stack[-1], None)&nbsp; &nbsp; &nbsp; &nbsp; if e == None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stack.pop()&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stack.append(iter(e))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if tag == None or e.tag == tag:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield (e, len(stack) - 1)注意,这是比通过以下父链接确定深度更有效的(在使用lxml) -即它是O(n)与O(n log n)。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python