将深度嵌套的 XML 解析为 pandas 数据框

我正在尝试获取 XML 文件的特定部分并将其移动到 pandas 数据框中。按照 xml.etree 的一些教程,我仍然坚持获取输出。到目前为止,我已经设法找到了子节点,但我无法访问它们(即无法从中获取实际数据)。所以,这就是我到目前为止所得到的。


tree=ET.parse('data.xml')

root=tree_edu.getroot()

root.tag

#find all nodes within xml data

tree_edu.findall(".//")

#access the node

tree.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")

我想要的是从节点获取数据programDescriptions,特别是从 child获取数据programDescriptionText xml:lang="nl",当然还有一些额外的。但首先要关注这个。


一些需要处理的数据:


<?xml version="1.0" encoding="UTF-8"?>

<programs xmlns="http://someUrl.nl/schema/enterprise/program">

<program xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://someUrl.nl/schema/enterprise/program http://someUrl.nl/schema/enterprise/program.xsd">

<customizableOnRequest>true</customizableOnRequest>

<editor>webmaster@url</editor>

<expires>2019-04-21</expires>

<format>Edu-dex 1.0</format>

<generator>www.Url.com</generator>

<includeInCatalog>Catalogs</includeInCatalog>

<inPublication>true</inPublication>

<lastEdited>2019-04-12T20:03:09Z</lastEdited>

<programAdmission>

    <applicationOpen>true</applicationOpen>

    <applicationType>individual</applicationType>

    <maxNumberOfParticipants>12</maxNumberOfParticipants>

    <minNumberOfParticipants>8</minNumberOfParticipants>

    <paymentDue>up-front</paymentDue>

    <requiredLevel>academic bachelor</requiredLevel>

    <startDateDetermination>fixed starting date</startDateDetermination>

</programAdmission>

<programCurriculum>

    <instructionMode>training</instructionMode>

    <teacher>

        <id>{D83FFC12-0863-44A6-BDBB-ED618627F09D}</id>

        <name>SomeName</name>

        <summary xml:lang="nl">

        Long text of the summary. Not needed.

        </summary>

    </teacher>

    <studyLoad period="hour">26</studyLoad>

</programCurriculum>


茅侃侃
浏览 144回答 1
1回答

皈依舞

试试下面的代码:(55703748.xml 包含您发布的 xml)import xml.etree.ElementTree as ETtree = ET.parse('55703748.xml')root = tree.getroot()nodes = root.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")for node in nodes:&nbsp; &nbsp; print(node.text)输出short Program Course Name summary
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python