首页 > 解决方案 > 如何使用 Python elementTree 提取 xml 数据中的特定元素

问题描述

我正在尝试从大型 XML 输出中提取一些数据,有人可以帮我提取TotalMilliseconds99% 的数据吗?

我正在尝试使用ElementTree解析器在 Python 中执行此操作。

我目前正在尝试做这样的事情,它找到第 99 个百分位,然后尝试TotalMilliseconds从该级别的根中找到。

但这什么也没返回,在调试中我可以看到它进入了 99 子句,但我有点迷失了从那里去的地方。

for item in root.findall('./TimeSpan/Latency/Bucket/Percentile'):
    if item.text == "99":
        totalMs = item.find('TotalMilliseconds').text
        print(totalMs)
<TimeSpan>
<Latency>
<Bucket>
<Percentile>96</Percentile>
<ReadMilliseconds>55.378</ReadMilliseconds>
<WriteMilliseconds>105.115</WriteMilliseconds>
<TotalMilliseconds>98.546</TotalMilliseconds>
</Bucket>
<Bucket>
<Percentile>97</Percentile>
<ReadMilliseconds>59.552</ReadMilliseconds>
<WriteMilliseconds>109.733</WriteMilliseconds>
<TotalMilliseconds>104.649</TotalMilliseconds>
</Bucket>
<Bucket>
<Percentile>98</Percentile>
<ReadMilliseconds>64.891</ReadMilliseconds>
<WriteMilliseconds>116.998</WriteMilliseconds>
<TotalMilliseconds>111.300</TotalMilliseconds>
</Bucket>
<Bucket>
<Percentile>99</Percentile>
<ReadMilliseconds>81.629</ReadMilliseconds>
<WriteMilliseconds>131.931</WriteMilliseconds>
<TotalMilliseconds>125.176</TotalMilliseconds>
</Bucket>
</Latency>
</TimeSpan>

标签: pythonxmlelementtree

解决方案


见下文

import xml.etree.ElementTree as ET

data = """<?xml version="1.0" encoding="UTF-8"?>
<TimeSpan>
   <Latency>
      <Bucket>
         <Percentile>96</Percentile>
         <ReadMilliseconds>55.378</ReadMilliseconds>
         <WriteMilliseconds>105.115</WriteMilliseconds>
         <TotalMilliseconds>98.546</TotalMilliseconds>
      </Bucket>
      <Bucket>
         <Percentile>97</Percentile>
         <ReadMilliseconds>59.552</ReadMilliseconds>
         <WriteMilliseconds>109.733</WriteMilliseconds>
         <TotalMilliseconds>104.649</TotalMilliseconds>
      </Bucket>
      <Bucket>
         <Percentile>98</Percentile>
         <ReadMilliseconds>64.891</ReadMilliseconds>
         <WriteMilliseconds>116.998</WriteMilliseconds>
         <TotalMilliseconds>111.300</TotalMilliseconds>
      </Bucket>
      <Bucket>
         <Percentile>99</Percentile>
         <ReadMilliseconds>81.629</ReadMilliseconds>
         <WriteMilliseconds>131.931</WriteMilliseconds>
         <TotalMilliseconds>125.176</TotalMilliseconds>
      </Bucket>
   </Latency>
</TimeSpan>"""

root = ET.fromstring(data)
# data is a list to support the case of many 99 Percentile
data = [e.find('TotalMilliseconds').text for e in root.findall('.//Bucket') if e.find('Percentile').text == '99']
print(data)

输出

['125.176']

推荐阅读