首页 > 解决方案 > 通过只保留深度为 2 的每个节点的 1 个子节点来简化 XML

问题描述

我有一个复杂的 XML,很难看到其结构,因为第 2 级的每个节点都有数千个子节点。我想像这样截断 XML:

<main>
  <a type="x">
    <b>
    <b>  # should be deleted
    <b>  # should be deleted
    # thousand others
  </a>
  <a type="y">
    <c>
    <c>  # should be deleted
    # many others
  </a>
  <a type="z">
    <d>
    <d>  # should be deleted
    # many others
  </a>
</main>

如何为2、3等的每个节点只保留一个孩子并导出结果?

我试过这个,但似乎没有被删除:

import xml.etree.ElementTree as ET
tree = ET.parse('in.xml')
root = tree.getroot()    

for l1 in root:
    print(l1, l1.tag, l1.attrib)
    for i, l2 in enumerate(l1):
        print(i, l2)
        if i > 0:
            l1.remove(l2)    # nothing seems removed, why?

tree.write('out.xml')            

标签: pythonxml

解决方案


问题可能来自我在迭代时修改了一个列表。使用list(l1)解决它:

for l1 in root:
    for l2 in list(l1)[1:]:
        l1.remove(l2)

推荐阅读