猿问

如何将xml解析为具有同级元素的表?

我的 xml 看起来像这样:


xml = """

<portfolio>

    <assets>600000</assets>

    <assetClassDetails>

        <assetClassName>Bonds</assetClassName>

        <assetAmount>100000</assetAmount>

    </assetClassDetails>

    <assetClassDetails>

        <assetClassName>Equities</assetClassName>

        <assetAmount>500000</assetAmount>

    </assetClassDetails>

    <rateOfReturn>6.3</rateOfReturn>

</portfolio>

"""

我通过这样做将每个元素解析到一个表中:


root = etree.fromstring(xml)


tag = []

text = []

parent = []

double_parent = []


for element in root.iter():

    try:

        element_parent = element.getparent().tag

    except AttributeError:

        element_parent = 'none'

    try:

        element_double_parent = element.getparent().getparent().tag

    except AttributeError:

        element_double_parent = 'none'

    tag.append(element.tag)

    text.append(element.text)

    parent.append(element_parent)

    double_parent.append(element_double_parent)


df = pd.DataFrame({'tag' : tag, 'text' : text, 'parent' : parent, 'double_parent' : double_parent})

结果是这样的:


tag                 text      parent            double_parent

portfolio           \n        none              none

assets              600000    portfolio         none

assetClassDetails   \n        portfolio         none

assetClassName      Bonds     assetClassDetails portfolio

assetAmount         100000    assetClassDetails portfolio

assetClassDetails   \n        portfolio         none

assetClassName      Equities  assetClassDetails portfolio

assetAmount         500000    assetClassDetails portfolio

rateOfReturn        6.3       portfolio         none

我正在努力解决如何转换数据,以便将资产类别名称和金额配对并绑定到投资组合标签(及其直接子项)。如何在结果中配对同级标签?


我想要的结果如下所示:


type        assets  rateOfReturn    assetClassName  assetAmount

portfolio   600000  6.3             Bonds           100000

portfolio   600000  6.3             Equities        500000


犯罪嫌疑人X
浏览 194回答 3
3回答

DIEA

尝试类似的方法:rows = []columns = ['assets',&nbsp; 'rateOfReturn',&nbsp; &nbsp; 'assetClassName',&nbsp; 'assetAmount']for entry in root.xpath('//assetClassDetails'):&nbsp; &nbsp; row = []&nbsp; &nbsp; row.extend([entry.xpath('preceding-sibling::assets/text()')[0],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; entry.xpath('following-sibling::rateOfReturn/text()')[0],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; entry.xpath('./assetClassName/text()')[0],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; entry.xpath('./assetAmount/text()')[0]])&nbsp; &nbsp; rows.append(row)pd.DataFrame(rows,columns=columns)输出:&nbsp; &nbsp; assets&nbsp; rateOfReturn&nbsp; &nbsp; assetClassName&nbsp; assetAmount0&nbsp; &nbsp;600000&nbsp; 6.3&nbsp; &nbsp; &nbsp;Bonds&nbsp; &nbsp;1000001&nbsp; &nbsp;600000&nbsp; 6.3&nbsp; &nbsp; &nbsp;Equities&nbsp; &nbsp; 500000另一种有趣的方法是使用另一个库:import pandas_read_xml as pdxdf1 = pdx.read_xml(r'path\to\myfile.xml',['portfolio','assetClassDetails'])df2 = pdx.read_xml(r'path\to\myfile.xml',['portfolio'])pd.concat([df2[['assets','rateOfReturn']],df1], axis=1)输出:assets&nbsp; &nbsp; &nbsp;rateOfReturn assetClassName&nbsp; assetAmount0&nbsp; &nbsp;600000&nbsp; 6.3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Bonds&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1000001&nbsp; &nbsp;600000&nbsp; 6.3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Equities&nbsp; &nbsp; &nbsp; &nbsp; 500000

肥皂起泡泡

使用 @JackFleeting 提到的包的另一种方法可能是:import pandas_read_xml as pdxfrom pandas_read_xml import fully_flattendf = (pdx.read_xml(r'path\to\myfile.xml', ['portfolio'])&nbsp; &nbsp; &nbsp; .pipe(fully_flatten))展平将列表(XML 中的同级标签)展开为单独的行,或将字典(XML 中的子标签)展开为单独的列。

小怪兽爱吃肉

下面(不使用任何外部库)import xml.etree.ElementTree as ETxml = """<portfolio>&nbsp; &nbsp; <assets>600000</assets>&nbsp; &nbsp; <assetClassDetails>&nbsp; &nbsp; &nbsp; &nbsp; <assetClassName>Bonds</assetClassName>&nbsp; &nbsp; &nbsp; &nbsp; <assetAmount>100000</assetAmount>&nbsp; &nbsp; </assetClassDetails>&nbsp; &nbsp; <assetClassDetails>&nbsp; &nbsp; &nbsp; &nbsp; <assetClassName>Equities</assetClassName>&nbsp; &nbsp; &nbsp; &nbsp; <assetAmount>500000</assetAmount>&nbsp; &nbsp; </assetClassDetails>&nbsp; &nbsp; <rateOfReturn>6.3</rateOfReturn></portfolio>"""data = []root = ET.fromstring(xml)global_properties = {'assets': root.find('assets').text, 'rateOfReturn': root.find('rateOfReturn').text,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'type': root.tag}for asset in root.findall('.//assetClassDetails'):&nbsp; &nbsp; entry = {x.tag: x.text for x in list(asset)}&nbsp; &nbsp; for k, v in global_properties.items():&nbsp; &nbsp; &nbsp; &nbsp; entry[k] = v&nbsp; &nbsp; data.append(entry)for entry in data:&nbsp; &nbsp; print(entry)输出{'assetClassName': 'Bonds', 'assetAmount': '100000', 'assets': '600000', 'rateOfReturn': '6.3', 'type': 'portfolio'}{'assetClassName': 'Equities', 'assetAmount': '500000', 'assets': '600000', 'rateOfReturn': '6.3', 'type': 'portfolio'}
随时随地看视频慕课网APP

相关分类

Python
我要回答