首页 > 解决方案 > 如何在 Python 中使用 ElementTree 解析来自同一标签的值?

问题描述

我正在使用 python 来解析 XML 文件,但我遇到了问题。我以字典的形式获取值,但如果有两个或多个相同的值,那么它们就不会重复。我确定有办法解决它,但我是 python 和解析 XML 的新手......

下面是一个 XML 示例:

<Root>
<Child1>
</Child1>
<Child2>
    <Data DId = "1">
        <Group ID = "">
            <Sport Name="Cricket" Team="6" />
            <Sport Name="Football" Team="6" />
            <Sport Name="Hockey" Team="5" />
        </Group>
    </Data>
    <Data DId = "2">
        <Group ID = "">
            <Sport Name="Rugby" Team="6" />
            <Sport Name="Baseball" Team="10" />
            <Sport Name="Swimming" Team="6" />
        </Group>
    </Data>
</Child2>
</Root>

我想获取由 Data 分隔的 Sport 的标签值。我试过的代码是:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
dict1 = {}
for i in root.iter('Sport'):
    dict1[i.attrib['Name']] = [j.text for j in i]
    dict1[i.attrib['Team']] = [k.text for k in i]

print(dict1)

但我无法获得每项运动的团队价值。

标签: xmlparsingpython-3.6

解决方案


试试这个库。

from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<Root>
<Child1>
</Child1>
<Child2>
    <Data DId = "1">
        <Group ID = "">
            <Sport Name="Cricket" Team="6" />
            <Sport Name="Football" Team="6" />
            <Sport Name="Hockey" Team="5" />
        </Group>
    </Data>
    <Data DId = "2">
        <Group ID = "">
            <Sport Name="Rugby" Team="6" />
            <Sport Name="Baseball" Team="10" />
            <Sport Name="Swimming" Team="6" />
        </Group>
    </Data>
</Child2>
</Root>
'''
# xml = utils.getFileContent('test.xml')
dict1 = {}
doc = SimplifiedDoc(xml)
datas = doc.selects('Data')
for i in datas:
    dic = {}
    for j in i.selects('Sport'):
        dic[j['Name']] = j['Team']
    dict1[i['DId']] = dic
print(dict1)

结果:

{'1': {'Cricket': '6', 'Football': '6', 'Hockey': '5'}, '2': {'Rugby': '6', 'Baseball': '10', 'Swimming': '6'}}

推荐阅读