首页 > 解决方案 > python:解析 XML 字段

问题描述

使用下面的 Python3 脚本,我能够解析 XML 记录并将其转换为列表(通过从中提取值字段)。

请帮助改进它以使用 XML 记录中的名称“:”值打印。

例如:假设下面的一块

<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>

获得输出

RESGJG, PY

所需输出:

RecordType:RESGJG, RecordTypeHEC:PY

我的输入文件:dummy.xml(##请注意它有两条记录##每条记录都以记录源=“AJS/SHD”开头)

<?xml version="1.0" encoding="UTF-8"?>
<records>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="MESGJG"/>
<field name="RecordTypeHEC" value="DY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
</records>

我已经尝试过下面的脚本来解析 XML 字段并以列表格式打印。

import sys
import operator
from functools import reduce
from xml.etree.ElementTree import ElementTree

tree = ElementTree()
tree.parse("dummy.xml")
root = tree.getroot()
data = []
groups = root.findall('.//group')
for group in groups:
    data.append([f.attrib['value'] for f in group.findall('./field')])
    q = reduce(operator.concat, data)
    s = ", ".join(q)
print(s)

获取输出为

RESGJG, PY, rock.dsjjgds.cm, 7937973, 20171049979, 201704059739793, 973979i, normal, dshhkdhs, sdidydakyd2133@10.10.10.1, NotPresent, sip:+47668384, sip:+08779379972, sip:+07073873772@10.0.0.1, sip:+878379739, sip:+937973962, blshahd, ctr, part, 7973, 67367672, 797397, MESGJG, DY, rock.dsjjgds.cm, 7937973, 20171049979, 201704059739793, 973979i, normal, dshhkdhs, sdidydakyd2133@10.10.10.1, NotPresent, sip:+47668384, sip:+08779379972, sip:+07073873772@10.0.0.1, sip:+878379739, sip:+937973962, blshahd, ctr, part, 7973, 67367672, 797397

所需输出:

RecordType:RESGJG, RecordTypeHEC:PY, NodeID:rock.dsjjgds.cm, SequenceNumber:7937973, StartDate:20171049979, EndDate:201704059739793, CallDuration:973979i, CauseForRecordClosing:normal, ICID:dshhkdhs, CallID:sdidydakyd2133@10.10.10.1, User-Agent:NotPresent, Request-URI:sip:+47668384, CalledPartyNumber:sip:+08779379972, CallingPartyNumber:sip:+07073873772@10.0.0.1, To:sip:+878379739, From:sip:+937973962, VPN_NAME_B:blshahd, VPN_Group_B:ctr, B_ExtType:part, B_ISDN:7973, B_SIP:67367672, B_PABXID:797397,

RecordType:MESGJG, RecordTypeHEC:DY, NodeID:rock.dsjjgds.cm, SequenceNumber:7937973, StartDate:20171049979, EndDate:201704059739793, CallDuration:973979i, CauseForRecordClosing:normal, ICID:dshhkdhs, CallID:sdidydakyd2133@10.10.10.1, User-Agent:NotPresent, Request-URI:sip:+47668384, CalledPartyNumber:sip:+08779379972, CallingPartyNumber:sip:+07073873772@10.0.0.1, To:sip:+878379739, From:sip:+937973962, VPN_NAME_B:blshahd, VPN_Group_B:ctr, B_ExtType:part, B_ISDN:7973, B_SIP:67367672, B_PABXID:797397,

请帮我

标签: pythonregexpython-3.xxml

解决方案


您的代码仅获取value属性,它完全忽略name.

此外,使用reduce有点矫枉过正。

groups = root.findall('.//group')
for group in groups:
    print(', '.join('{}: {}'.format(field.attrib['name'], field.attrib['value']) for field in group.findall('./field')))
    print()

将输出:

RecordType: RESGJG, RecordTypeHEC: PY, NodeID: rock.dsjjgds.cm, SequenceNumber: 7937973, StartDate: 20171049979, EndDate: 201704059739793, CallDuration: 973979i, CauseForRecordClosing: normal

ICID: dshhkdhs, CallID: sdidydakyd2133@10.10.10.1, User-Agent: NotPresent, Request-URI: sip:+47668384, CalledPartyNumber: sip:+08779379972, CallingPartyNumber: sip:+07073873772@10.0.0.1, To: sip:+878379739, From: sip:+937973962

VPN_NAME_B: blshahd, VPN_Group_B: ctr, B_ExtType: part, B_ISDN: 7973, B_SIP: 67367672, B_PABXID: 797397

RecordType: MESGJG, RecordTypeHEC: DY, NodeID: rock.dsjjgds.cm, SequenceNumber: 7937973, StartDate: 20171049979, EndDate: 201704059739793, CallDuration: 973979i, CauseForRecordClosing: normal

ICID: dshhkdhs, CallID: sdidydakyd2133@10.10.10.1, User-Agent: NotPresent, Request-URI: sip:+47668384, CalledPartyNumber: sip:+08779379972, CallingPartyNumber: sip:+07073873772@10.0.0.1, To: sip:+878379739, From: sip:+937973962

VPN_NAME_B: blshahd, VPN_Group_B: ctr, B_ExtType: part, B_ISDN: 7973, B_SIP: 67367672, B_PABXID: 797397

推荐阅读