首页 > 解决方案 > 使用 Python 解析 XML 文档

问题描述

我有一个相当复杂的 XML 文档,至少对我来说,上面有几个信息,我尝试检查 lxml 库以完成任务,但我遇到了困难。

我拥有的 XML 文档与下面的非常相似:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
    <measCollecFile
        xmlns="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec">
        <fileHeader fileFormatVersion="32.435 V8.0.0"
            vendorName="Nokia">
            <fileSender
                localDn="MCC=096,MNC=724,ManagedElement=SAEGW01LEM"
                elementType="pgw instance 1" />
            <measCollec beginTime="2019-05-14T12:00:01-03:00" />
        </fileHeader>
        <measData>
            <managedElement
                localDn="MCC=096,MNC=724,ManagedElement=SAEGW01LEM"
                swVersion="C-10.0.R9" />
            <measInfo measInfoId="KPISystemCP-ISA">
                <granPeriod duration="PT300S" endTime="2019-05-14T12:05:01-03:00" />
                <measType p="1">VS.avgCpuUtilization</measType>
                <measType p="2">VS.avgMemoryUtilization</measType>
                <measType p="3">VS.avgMemoryUtilization1M</measType>
                <measType p="4">VS.SDFsFpUtilization</measType>
                <measType p="5">VS.SDFsLcpUtilization</measType>
                <measType p="6">VS.avgVmFpCpuNicUsage</measType>
                <measType p="7">VS.avgVmFpCpuWorkerUsage</measType>
                <measType p="8">VS.avgVmFpCpuSchedulerUsage</measType>
                <measType p="9">VS.avgVmFpCpuCollapsedUsage</measType>
                <measType p="10">VS.avgVmFpCpuCombinedUsage</measType>
                <measType p="11">VS.hwCfgBitsInfo</measType>
                <measValue measObjLdn="KPI=System,GroupName=CP-ISA,group=1,slot=3,mda=1">
                    <r p="1">1</r>
                    <r p="2">72</r>
                    <r p="3">72</r>
                    <r p="4">0.00</r>
                    <r p="5">0.00</r>
                    <r p="6">0.00</r>
                    <r p="7">0.05</r>
                    <r p="8">0.00</r>
                    <r p="9">0.00</r>
                    <r p="10">0.00</r>
                    <r p="11">4</r>
                    <suspect>false</suspect>
                </measValue>
            </measInfo>

我想知道如何使用 python 访问 VS.avgMemoryUtilization1M 的值。

我知道 VS.avgMemoryUtilization1M 的值为 72,但是如何使用 lxml 库从 python 访问它?

标签: pythonxml

解决方案


您可以BeautifulSoup用来解析 XML 数据(优点是您可以使用 CSS 选择器,XML 可能格式错误等):

from bs4 import BeautifulSoup

data = '''    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
    <measCollecFile
        xmlns="http://www.3gpp.org/ftp/specs/archive/32_series/32.435#measCollec">
        <fileHeader fileFormatVersion="32.435 V8.0.0"
            vendorName="Nokia">
            <fileSender
                localDn="MCC=096,MNC=724,ManagedElement=SAEGW01LEM"
                elementType="pgw instance 1" />
            <measCollec beginTime="2019-05-14T12:00:01-03:00" />
        </fileHeader>
        <measData>
            <managedElement
                localDn="MCC=096,MNC=724,ManagedElement=SAEGW01LEM"
                swVersion="C-10.0.R9" />
            <measInfo measInfoId="KPISystemCP-ISA">
                <granPeriod duration="PT300S" endTime="2019-05-14T12:05:01-03:00" />
                <measType p="1">VS.avgCpuUtilization</measType>
                <measType p="2">VS.avgMemoryUtilization</measType>
                <measType p="3">VS.avgMemoryUtilization1M</measType>
                <measType p="4">VS.SDFsFpUtilization</measType>
                <measType p="5">VS.SDFsLcpUtilization</measType>
                <measType p="6">VS.avgVmFpCpuNicUsage</measType>
                <measType p="7">VS.avgVmFpCpuWorkerUsage</measType>
                <measType p="8">VS.avgVmFpCpuSchedulerUsage</measType>
                <measType p="9">VS.avgVmFpCpuCollapsedUsage</measType>
                <measType p="10">VS.avgVmFpCpuCombinedUsage</measType>
                <measType p="11">VS.hwCfgBitsInfo</measType>
                <measValue measObjLdn="KPI=System,GroupName=CP-ISA,group=1,slot=3,mda=1">
                    <r p="1">1</r>
                    <r p="2">72</r>
                    <r p="3">72</r>
                    <r p="4">0.00</r>
                    <r p="5">0.00</r>
                    <r p="6">0.00</r>
                    <r p="7">0.05</r>
                    <r p="8">0.00</r>
                    <r p="9">0.00</r>
                    <r p="10">0.00</r>
                    <r p="11">4</r>
                    <suspect>false</suspect>
                </measValue>
            </measInfo>'''

soup = BeautifulSoup(data, 'xml')
p = soup.select_one('measType[p]:contains("VS.avgMemoryUtilization1M")')['p']
print('Value of `VS.avgMemoryUtilization1M`={}'.format(soup.select_one('r[p="{}"]'.format(p)).text))

印刷:

Value of `VS.avgMemoryUtilization1M`=72

推荐阅读