猿问

Python:从 xml 文件构建不同的路径/树

以下是 xml 文件的示例:


<?xml version="1.0" encoding="utf-8"?>

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">

  <SOAP-ENV:Header />

  <SOAP-ENV:Body>

    <ADD_LandIndex_001>

      <CNTROLAREA>

        <BSR>

          <status>ADD</status>

          <NOUN>LandIndex</NOUN>

          <REVISION>001</REVISION>

        </BSR>

      </CNTROLAREA>

      <DATAAREA>

        <LandIndex>

          <reportId>AMI100031</reportId>

          <requestKey>R3278458</requestKey>

          <SubmittedBy>EN4871</SubmittedBy>

          <submittedOn>2015/01/06 4:20:11 PM</submittedOn>

          <LandIndex>

            <agreementdetail>

              <agreementid>001       4860</agreementid>

              <agreementtype>NATURAL GAS</agreementtype>

              <currentstatus>

                <status>ACTIVE</status>

                <statuseffectivedate>1965/02/18</statuseffectivedate>

                <termdate>1965/02/18</termdate>

              </currentstatus>

              <designatedrepresentative>

              </designatedrepresentative>

            </agreementdetail>

          </LandIndex>

        </LandIndex>

      </DATAAREA>

    </ADD_LandIndex_001>

  </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

我想将 xml 文件中包含文本的所有不同路径存储在列表中。所以我想要这样的东西:


['Envelope/Body/ADD_LandIndex_01/CNTROLAREA/BSR/status', 'Envelope/Body/ADD_LandIndex_01/CNTROLAREA/BSR/LandIndex', ...]

我尝试了一些不起作用的代码。我不知道如何单独获取一个分支的最后一个元素,以及当我在中间切换节点时如何从头开始所有路径(即Envelope/Body/ADD_LandIndex_01/DATAAREA...


import xml.etree.ElementTree as et

import os

import pandas as pd

from re import search


filename = 'file_try.xml'

element_tree = et.parse(filename)

root = element_tree.getroot()

namespace = "{http://schemas.xmlsoap.org/soap/envelope/}"



def remove_namespace(string,namespace) :

    

    if search(namespace, string) :

        new_string = string.replace(namespace,'')

    else : 

        new_string= string

    return new_string


谁能帮我 ?


慕哥9229398
浏览 84回答 1
1回答

狐的传说

您可以根据实际代码修改它,但基本上 - 它应该如下所示:from lxml import etreesoap = """[your xml above]"""root = etree.XML(soap.encode())&nbsp; &nbsp;&nbsp;tree = etree.ElementTree(root)for target in root.xpath('//text()'):&nbsp; &nbsp; if len(target.strip())>0:&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; print(tree.getpath(target.getparent()).replace('SOAP-ENV:',''))输出:/Envelope/Body/ADD_LandIndex_001/CNTROLAREA/BSR/status/Envelope/Body/ADD_LandIndex_001/CNTROLAREA/BSR/NOUN/Envelope/Body/ADD_LandIndex_001/CNTROLAREA/BSR/REVISION/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/reportId/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/requestKey/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/SubmittedBy/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/submittedOn/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/LandIndex/agreementdetail/agreementid/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/LandIndex/agreementdetail/agreementtype/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/LandIndex/agreementdetail/currentstatus/status/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/LandIndex/agreementdetail/currentstatus/statuseffectivedate/Envelope/Body/ADD_LandIndex_001/DATAAREA/LandIndex/LandIndex/agreementdetail/currentstatus/termdate
随时随地看视频慕课网APP
我要回答