首页 > 解决方案 > 使用 Python 和 ElementTree 对 XML 文档进行排序

问题描述

我正在尝试重新组织一些 xml 文件,这些文件包含完整路由的几个部分,其结构如下:

<trk>
    <name>GPSRoute.XML</name>
    <trkseg>
        <trkpt lat="37.077882" lon="-112.242785">
            <ele>1688.00</ele>
            <time>2020-04-18T01:56:39.80Z</time>
        </trkpt>
        <extensions>
            <name>14</name>
            <gte:color>#00ce00</gte:color>
        </extensions>
    </trkseg>
    <trkseg>
        <trkpt lat="37.077888" lon="-112.242783">
            <ele>1688.00</ele>
            <time>2020-04-18T01:56:39.80Z</time>
        </trkpt>
        <extensions>
            <name>1</name>
            <gte:color>#00ce00</gte:color>
        </extensions>
    </trkseg>
</trk>

我正在尝试按名称而不是当前时间对文件进行排序,并将结果写入新文件。到目前为止,这就是我已经走了多远,它成功地捕获了列表中的名称,但它在 data.sort() 上出错:

“TypeError:'xml.etree.ElementTree.Element'和'xml.etree.ElementTree.Element'的实例之间不支持'<'”

如果有人能指出我正确的方向,将不胜感激!

import xml.etree.ElementTree as ET

tree = ET.parse('Filename.xml')

root = tree.getroot()
data = []
for track in root:
    for segment in track:
        for extension in segment:
            for name in extension.findall('name'):
                print(name.text)
                data.append((name))
            data.sort()


tree.write('Sorted.xml')

标签: pythonxmlsortingxml-parsingelementtree

解决方案


我认为,在您到达 xpath 3.1 之前,没有真正的方法可以对 xml 进行排序,但您可以自行解决这个问题。

请注意,由于您问题中的 xml 无效(您有未声明的名称空间),因此我使用了更宽容的 html 解析器。对于您的实际代码,您应该使用 xml 解析器,如下所示。

这段代码的作用是从每个父节点收集每个<name>子节点的节点值(即您的目标编号)<trkseg>,将它们保存到列表中,对列表进行排序,使用排序列表再次<trkseg>按排序顺序选择节点,并使用它们(连同开始和结束标签)来创建一个新的 xml。

import lxml.html as lh # with actual xml you would probably use "from lxml import etree"
trk = """your xml above"""

doc = lh.fromstring(trk) # with actual xml you should probably use "doc = etree.XML(trk)"

names = []
new_trk = """<trk>
    <name>GPSRoute.XML</name>""" # this is the preamble which is left untouched
for nam in doc.xpath('//extensions//name'):
    names.append(nam.text) #grab the numbers
for name in sorted(names): #sort the grabbed numbers
    target = doc.xpath(f'//trkseg[.//name/text()={name}]')
    for t in target:
        new_trk += lh.tostring(t).decode()
new_trk += '</trk>' # append the closing tag, which is also left untouched
print(new_trk)

输出:

<trk>
    <name>GPSRoute.XML</name><trkseg>
        <trkpt lat="37.077888" lon="-112.242783">
            <ele>1688.00</ele>
            <time>2020-04-18T01:56:39.80Z</time>
        </trkpt>
        <extensions>
            <name>1</name>
            <color>#00ce00</color>
        </extensions>
    </trkseg>
<trkseg>
        <trkpt lat="37.077882" lon="-112.242785">
            <ele>1688.00</ele>
            <time>2020-04-18T01:56:39.80Z</time>
        </trkpt>
        <extensions>
            <name>14</name>
            <color>#00ce00</color>
        </extensions>
    </trkseg>
    </trk>

推荐阅读