首页 > 解决方案 > 从python中的xml文件中删除子元素

问题描述

我正在学习如何使用 python 和 lxml 库以及 ElementTree 在 xml 文件中进行修改。经过一些工作,我得到了这个 xml 文件:

<component xmlns:xsi="http://www.w3.orgr">
  <memoryMaps>
    <memoryMap>
      <name>name</name>
      <description>description</description>
      <peripheral>
        <name>periph</name>
        <description>description</description>
        <baseAddress>0x0</baseAddress>
        <range>0x8</range>
        <width>32</width>
        <register>
          <name>reg1</name>
          <displayName>reg1</displayName>
          <description>This is register 1</description>
          <addressOffset>0x0</addressOffset>
          <size>32</size>
          <access>read-write</access>
          <resetValue>0x00000002</resetValue>
          <resetMask>0xFFFFFFFF</resetMask>
         <fields>
           .................
         </fields>
         <resetValue>0x00000002</resetValue>
         <resetMask>0xFFFFFFFF</resetMask>
         <description>This is register 1</description>
        </register>                           
        <register>
              .................
         </register>
         <register>
             ..................
         </register>      
      </peripheral>
    </memoryMap>
  </memoryMaps>
</component>

我在这里为每个“注册”节点使用函数ET.SubElement创建了三个新的子元素(“ resetValue ”、“ resetMask ”和“ description ”) ,后来我使用element.insert将它们复制到上面的新位置,但现在我需要从寄存器节点的末尾删除额外的,让它看起来像这样:

<register>
      <name>reg1</name>
      <displayName>reg1</displayName>
      <description>This is register 1</description>
      <addressOffset>0x0</addressOffset>
      <size>32</size>
      <access>read-write</access>
      <resetValue>0x00000002</resetValue>
      <resetMask>0xFFFFFFFF</resetMask>
     <fields>
       .................
     </fields>
  </register>

我知道这可能不是创建子元素的最佳方法,然后将其替换到新位置并现在尝试删除,但是当我查看另一个名为ET.Element然后insert的函数时,我很难找到正确位置的索引所以我选择了这种方法。

在这里,我列出了我在此处使用的代码的一部分(我从其他元素中提取了子元素的文本):

   v =  ET.SubElement(register, 'resetValue')
   v.text = value 
                                                                                                       
   m = ET.SubElement(register, 'resetMask')  
   m.text = mask 
                                                                                                           
   displayName = register.find('.//displayName').text
   d = ET.SubElement(register, 'description')
   d.text = displayName 
                                                                                   
   register.insert(5, v)  
   register.insert(6, m)                                                                             
   register.insert(2, d) 


                                                                                               

(我希望这个代码部分能解决更多的问题)

有人可以给我建议吗!

标签: pythonxml

解决方案


有一个XML解析库,修改不够专业,但是灵活。这是一个示例供您参考。

from simplified_scrapy.spider import SimplifiedDoc
xml = '''
<component xmlns:xsi="http://www.w3.orgr">
  <memoryMaps>
    <memoryMap>
      <name>name</name>
      <description>description</description>
      <peripheral>
        <name>periph</name>
        <description>description</description>
        <baseAddress>0x0</baseAddress>
        <range>0x8</range>
        <width>32</width>
        <register>
          <name>reg1</name>
          <displayName>reg1</displayName>
          <addressOffset>0x0</addressOffset>
          <size>32</size>
          <access>read-write</access>
          <fields>
          .................
          </fields>
        </register>   
      </peripheral>
    </memoryMap>
  </memoryMaps>
</component>
'''
doc = SimplifiedDoc(xml)  # create doc
registers = doc.selects('register')

l = len(registers)
while l > 0:
    l = l - 1
    register = registers[l] # Start at the back. If you modify it from the front and continue to modify it on the basis of changing the structure of the document, the results may not meet the expectations.

    v = doc.createElement('resetValue', str(l))
    m = doc.createElement('resetMask', str(l))
    d = doc.createElement('description', 'This is register '+str(l))

    access = register.access
    access.insertAfter('\n' + ' ' * 10 + m) # Handling line breaks and padding
    access.insertAfter('\n' + ' ' * 10 + v)

    displayName = register.displayName
    displayName.insertAfter('\n' + ' ' * 10 + d)

print(doc.html)

结果:

<component xmlns:xsi="http://www.w3.orgr">
  <memoryMaps>
    <memoryMap>
      <name>name</name>
      <description>description</description>
      <peripheral>
        <name>periph</name>
        <description>description</description>
        <baseAddress>0x0</baseAddress>
        <range>0x8</range>
        <width>32</width>
        <register>
          <name>reg1</name>
          <displayName>reg1</displayName>
          <description>This is register 0</description>
          <addressOffset>0x0</addressOffset>
          <size>32</size>
          <access>read-write</access>
          <resetValue>0</resetValue>
          <resetMask>0</resetMask>
          <fields>
          .................
          </fields>
        </register>   
      </peripheral>
    </memoryMap>
  </memoryMaps>
</component>

推荐阅读