猿问

如何正确地从这个嵌套的 XML 中获取数据?

我有以下 XML:


<?xml version="1.0" encoding="UTF-8"?>

<data>

  <columns>

    <Leftover index="5">Leftover</Leftover>

    <NODE5 index="6"></NODE5>

    <NODE6 index="7"></NODE6>

    <NODE8 index="9"></NODE8>

    <Nomenk__Nr_ index="2">Nomenk.

Nr.</Nomenk__Nr_>

    <Year index="8">2020</Year>

    <Name index="1">Name</Name>

    <Value_code index="3">Value code</Value_code>

  </columns>

  <records>

    <record index="1">

      <Leftover>Leftover</Leftover>

      <NODE5>Test1</NODE5>

      <NODE6>Test2</NODE6>

      <NODE8>Test3</NODE8>

      <Nomenk__Nr_></Nomenk__Nr_>

      <Name></Name>

      <Value_code></Value_code>

    </record>

  ... (it repeats itself with different values and the index value increments)

我的代码是:


import lxml

import lxml.etree as et

xml = open('C:\outputfile.xml', 'rb')

xml_content = xml.read()

tree = et.fromstring(xml_content)

for bad in tree.xpath("//records[@index=\'*\']/NODE5"):

  bad.getparent().remove(bad)     # here I grab the parent of the element to call the remove directly on it

result = (et.tostring(tree, pretty_print=True, xml_declaration=True))

f = open( 'outputxml.xml', 'w' )

f.write( str(result) )

f.close()

我需要做的是删除 NODE5、NODE6、NODE8。我尝试使用通配符,然后指定一个节点(参见第 6 行),但这似乎不起作用...我还在第一个字符的循环之后收到语法错误,但代码执行了。


我的问题还在于,当文件“导出”时,lxml 的编码随后设置为 ASCII。


更新 我在第 8 行收到此错误:


    return = ...

    ^

SyntaxError: invalid syntax


犯罪嫌疑人X
浏览 112回答 1
1回答

慕斯王

我需要做的是删除 NODE5、NODE6、NODE8。以下import xml.etree.ElementTree as ETxml = '''<?xml version="1.0" encoding="UTF-8"?><data>&nbsp; &nbsp;<columns>&nbsp; &nbsp; &nbsp; <Leftover index="5">Leftover</Leftover>&nbsp; &nbsp; &nbsp; <NODE5 index="6" />&nbsp; &nbsp; &nbsp; <NODE6 index="7" />&nbsp; &nbsp; &nbsp; <NODE8 index="9" />&nbsp; &nbsp; &nbsp; <Nomenk__Nr_ index="2">Nomenk.Nr.</Nomenk__Nr_>&nbsp; &nbsp; &nbsp; <Year index="8">2020</Year>&nbsp; &nbsp; &nbsp; <Name index="1">Name</Name>&nbsp; &nbsp; &nbsp; <Value_code index="3">Value code</Value_code>&nbsp; &nbsp;</columns>&nbsp; &nbsp;<records>&nbsp; &nbsp; &nbsp; <record index="1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Leftover>Leftover</Leftover>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE5>Test1</NODE5>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE6>Test2</NODE6>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE8>Test3</NODE8>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Nomenk__Nr_ />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Name />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Value_code />&nbsp; &nbsp; &nbsp; </record>&nbsp; &nbsp; &nbsp; <record index="21">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Leftover>Leftover</Leftover>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE5>Test11</NODE5>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE6>Test21</NODE6>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<NODE8>Test39</NODE8>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Nomenk__Nr_ />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Name />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Value_code />&nbsp; &nbsp; &nbsp; </record>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp;</records></data>'''root = ET.fromstring(xml)col = root.find('./columns')for x in ['5','6','8']:&nbsp; &nbsp; nodes_to_remove = col.findall('./NODE{}'.format(x))&nbsp; &nbsp; for node in nodes_to_remove:&nbsp; &nbsp; &nbsp; &nbsp; col.remove(node)records = root.find('./records')records_lst = records.findall('./record'.format(x))for r in records_lst:&nbsp; &nbsp; for x in ['5','6','8']:&nbsp; &nbsp; &nbsp; &nbsp; nodes_to_remove = r.findall('./NODE{}'.format(x))&nbsp; &nbsp; &nbsp; &nbsp; for node in nodes_to_remove:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; r.remove(node)&nbsp; &nbsp; &nbsp; &nbsp;ET.dump(root)输出<data>&nbsp; &nbsp;<columns>&nbsp; &nbsp; &nbsp; <Leftover index="5">Leftover</Leftover>&nbsp; &nbsp; &nbsp; <Nomenk__Nr_ index="2">Nomenk.Nr.</Nomenk__Nr_>&nbsp; &nbsp; &nbsp; <Year index="8">2020</Year>&nbsp; &nbsp; &nbsp; <Name index="1">Name</Name>&nbsp; &nbsp; &nbsp; <Value_code index="3">Value code</Value_code>&nbsp; &nbsp;</columns>&nbsp; &nbsp;<records>&nbsp; &nbsp; &nbsp; <record index="1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Leftover>Leftover</Leftover>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Nomenk__Nr_ />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Name />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Value_code />&nbsp; &nbsp; &nbsp; </record>&nbsp; &nbsp; &nbsp; <record index="2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Leftover>Leftover</Leftover>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Nomenk__Nr_ />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Name />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<Value_code />&nbsp; &nbsp; &nbsp; </record>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp;</records></data>
随时随地看视频慕课网APP

相关分类

Python
我要回答