有没有办法创建 XML 元素树?

我目前正在编写一些 XSD 和 DTD 来验证一些 XML 文件,我正在手工编写它们,因为我在使用 XSD 生成器(例如 Oxygen)时有过非常糟糕的体验。

但是,我已经有一个需要执行此操作的示例 XML,并且该 XML 非常巨大,例如,我有一个包含 4312 个子元素的元素。

由于我对 XSD 生成器的体验非常糟糕,因此我想创建一种仅包含唯一标签和属性的 XML 树,这样在查看要编写的 XML 时我不必处理重复元素一个XSD。

我的意思是,我有这个 XML(由 W3 提供):

<?xml version="1.0" encoding="UTF-8"?>

<breakfast_menu>

<food some_attribute="1.0">

    <name>Belgian Waffles</name>

    <price>$5.95</price>

    <description>

   Two of our famous Belgian Waffles with plenty of real maple syrup

   </description>

    <calories>650</calories>

</food>

<food>

    <name>Strawberry Belgian Waffles</name>

    <price>$7.95</price>

    <description>

    Light Belgian waffles covered with strawberries and whipped cream

    </description>

    <calories>900</calories>

</food>

<food>

    <name>Berry-Berry Belgian Waffles</name>

    <price>$8.95</price>

    <description>

    Belgian waffles covered with assorted fresh berries and whipped cream

    </description>

    <calories>900</calories>

</food>

<food>

    <name>French Toast</name>

    <price>$4.50</price>

    <description>

    Thick slices made from our homemade sourdough bread

    </description>

    <calories>600</calories>

    <some_complex_type_element_1>

      <some_simple_type_element_1>Text.</some_simple_type_element_1>

    </some_complex_type_element_1>

</food>

<food>

    <name>Homestyle Breakfast</name>

    <price>$6.95</price>

    <description>

    Two eggs, bacon or sausage, toast, and our ever-popular hash browns

    </description>

    <calories>950</calories>

    <some_simple_type_element_2>Text.</some_simple_type_element_2>

</food>

</breakfast_menu>

正如您所看到的,根元素下有 4 种类型的独特元素。

这些都是:

  • 元素 1(有属性),

  • 元素 2 和 3,

  • 元素 4(有另一个复杂类型元素),

  • 元素 5(有另一个 simpleType 元素)。

我想要实现的是此 XML 的某种树表示,但仅包含唯一元素且不包含文本。


慕的地6264312
浏览 35回答 1
1回答

小唯快跑啊

看看这是否满足您的需求。from simplified_scrapy import SimplifiedDoc, utilsxml = '''<?xml version="1.0" encoding="UTF-8"?><breakfast_menu>&nbsp; &nbsp; <food some_attribute="1.0">&nbsp; &nbsp; &nbsp; &nbsp; <name>Belgian Waffles</name>&nbsp; &nbsp; &nbsp; &nbsp; <price>$5.95</price>&nbsp; &nbsp; &nbsp; &nbsp; <description>&nbsp; &nbsp; Two of our famous Belgian Waffles with plenty of real maple syrup&nbsp; &nbsp; </description>&nbsp; &nbsp; &nbsp; &nbsp; <calories>650</calories>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name>Strawberry Belgian Waffles</name>&nbsp; &nbsp; &nbsp; &nbsp; <price>$7.95</price>&nbsp; &nbsp; &nbsp; &nbsp; <description>&nbsp; &nbsp; &nbsp; &nbsp; Light Belgian waffles covered with strawberries and whipped cream&nbsp; &nbsp; &nbsp; &nbsp; </description>&nbsp; &nbsp; &nbsp; &nbsp; <calories>900</calories>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name>Berry-Berry Belgian Waffles</name>&nbsp; &nbsp; &nbsp; &nbsp; <price>$8.95</price>&nbsp; &nbsp; &nbsp; &nbsp; <description>&nbsp; &nbsp; &nbsp; &nbsp; Belgian waffles covered with assorted fresh berries and whipped cream&nbsp; &nbsp; &nbsp; &nbsp; </description>&nbsp; &nbsp; &nbsp; &nbsp; <calories>900</calories>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name>French Toast</name>&nbsp; &nbsp; &nbsp; &nbsp; <price>$4.50</price>&nbsp; &nbsp; &nbsp; &nbsp; <description>&nbsp; &nbsp; &nbsp; &nbsp; Thick slices made from our homemade sourdough bread&nbsp; &nbsp; &nbsp; &nbsp; </description>&nbsp; &nbsp; &nbsp; &nbsp; <calories>600</calories>&nbsp; &nbsp; &nbsp; &nbsp; <some_complex_type_element_1>&nbsp; &nbsp; &nbsp; &nbsp; <some_simple_type_element_1>Text.</some_simple_type_element_1>&nbsp; &nbsp; &nbsp; &nbsp; </some_complex_type_element_1>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name>Homestyle Breakfast</name>&nbsp; &nbsp; &nbsp; &nbsp; <price>$6.95</price>&nbsp; &nbsp; &nbsp; &nbsp; <description>&nbsp; &nbsp; &nbsp; &nbsp; Two eggs, bacon or sausage, toast, and our ever-popular hash browns&nbsp; &nbsp; &nbsp; &nbsp; </description>&nbsp; &nbsp; &nbsp; &nbsp; <calories>950</calories>&nbsp; &nbsp; &nbsp; &nbsp; <some_simple_type_element_2>Text.</some_simple_type_element_2>&nbsp; &nbsp; </food></breakfast_menu>'''def loop(node):&nbsp; &nbsp; para = {}&nbsp; &nbsp; for k in node:&nbsp; &nbsp; &nbsp; &nbsp; if k=='tag' or k=='html': continue&nbsp; &nbsp; &nbsp; &nbsp; para[k] = ''&nbsp; &nbsp; if para: node.setAttrs(para) # Remove attributes&nbsp; &nbsp; children = node.children&nbsp; &nbsp; if children:&nbsp; &nbsp; &nbsp; &nbsp; for c in children:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; loop(c)&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; if node.text:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node.setContent('') # Remove valuedoc = SimplifiedDoc(xml)# Remove values and attributesloop(doc.breakfast_menu)dicNode = {}for node in doc.breakfast_menu.children:&nbsp; &nbsp; key = node.outerHtml&nbsp; &nbsp; if dicNode.get(key):&nbsp; &nbsp; &nbsp; &nbsp; node.remove() # Delete duplicate&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; dicNode[key] = Trueprint(doc.html)结果:<?xml version="1.0" encoding="UTF-8"?><breakfast_menu>&nbsp; &nbsp; <food some_attribute="">&nbsp; &nbsp; &nbsp; &nbsp; <name></name>&nbsp; &nbsp; &nbsp; &nbsp; <price></price>&nbsp; &nbsp; &nbsp; &nbsp; <description></description>&nbsp; &nbsp; &nbsp; &nbsp; <calories></calories>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name></name>&nbsp; &nbsp; &nbsp; &nbsp; <price></price>&nbsp; &nbsp; &nbsp; &nbsp; <description></description>&nbsp; &nbsp; &nbsp; &nbsp; <calories></calories>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name></name>&nbsp; &nbsp; &nbsp; &nbsp; <price></price>&nbsp; &nbsp; &nbsp; &nbsp; <description></description>&nbsp; &nbsp; &nbsp; &nbsp; <calories></calories>&nbsp; &nbsp; &nbsp; &nbsp; <some_complex_type_element_1>&nbsp; &nbsp; &nbsp; &nbsp; <some_simple_type_element_1></some_simple_type_element_1>&nbsp; &nbsp; &nbsp; &nbsp; </some_complex_type_element_1>&nbsp; &nbsp; </food>&nbsp; &nbsp; <food>&nbsp; &nbsp; &nbsp; &nbsp; <name></name>&nbsp; &nbsp; &nbsp; &nbsp; <price></price>&nbsp; &nbsp; &nbsp; &nbsp; <description></description>&nbsp; &nbsp; &nbsp; &nbsp; <calories></calories>&nbsp; &nbsp; &nbsp; &nbsp; <some_simple_type_element_2></some_simple_type_element_2>&nbsp; &nbsp; </food></breakfast_menu>对于大文件,请尝试以下方法。from simplified_scrapy import SimplifiedDoc, utilsfrom simplified_scrapy.core.regex_helper import replaceRegfilePath = 'test.xml'doc = SimplifiedDoc()doc.loadFile(filePath, lineByline=True)utils.appendFile('dest.xml','<?xml version="1.0" encoding="UTF-8"?><breakfast_menu>')dicNode = {}for node in doc.getIterable('food'):&nbsp; &nbsp; key = node.outerHtml&nbsp; &nbsp; key = replaceReg(key, '>[^>]*?<', '><')&nbsp; &nbsp; key = replaceReg(key, '"[^"]*?"', '""')&nbsp; &nbsp; if not dicNode.get(key):&nbsp; &nbsp; &nbsp; &nbsp; dicNode[key] = True&nbsp; &nbsp; &nbsp; &nbsp; utils.appendFile('dest.xml', key)utils.appendFile('dest.xml', '</breakfast_menu>')
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python