python-3.x - 使用 xml.etree.ElementTree 从 XML 字符串中提取数据
问题描述
我有一个 XML 字符串,我需要提取每组“行”中的前三个“col”标签。换句话说,输出应该如下:
['1043100330', 'Smith', 'John', '1043100331', 'Swartz', 'Francis', '1043100332', 'Laff', 'Michael']
这是 XML:
data = '''<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<processOXIMessageResponse xmlns="urn:com:singun:webservice" xmlns:ns="urn:com:singun:webservice">
<ns1:processOXIMessageReturn xmlns:ns1="urn:com:singun:webservice">
<SingunDocument xmlns="C" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" protocol="OXI">
<sessionId xmlns="">1613762599483</sessionId>
<command xmlns="" echo="" xsi:type="ServiceProviderGetListResponse">
<serviceProviderTable>
<colHeading>User Id</colHeading>
<colHeading>Last Name</colHeading>
<colHeading>First Name</colHeading>
<colHeading>Email Address</colHeading>
<colHeading>Phone Number</colHeading>
<colHeading>Extension</colHeading>
<colHeading>Country Code</colHeading>
<colHeading>National Prefix</colHeading>
<row>
<col>1043100330</col>
<col>Smith</col>
<col>John</col>
<col>jsmith@example.com</col>
<col>1043101330</col>
<col>1330</col>
<col>52</col>
<col>52</col>
</row>
<row>
<col>1043100331</col>
<col>Swartz</col>
<col>Francis</col>
<col>fswartz@example.com</col>
<col>1043101331</col>
<col>1331</col>
<col>52</col>
<col>52</col>
</row>
<row>
<row>
<col>1043100332</col>
<col>Laff</col>
<col>Michael</col>
<col>mlaff@example.com</col>
<col>1043101332</col>
<col>1332</col>
<col>52</col>
<col>52</col>
</row>
</serviceProviderTable>
</command>
</SingunDocument>
</ns1:processOXIMessageReturn>
</processOXIMessageResponse>
</soapenv:Body>
</soapenv:Envelope>
以下是我的代码,但它只提取第一个“col”标签,而不是我想要的每组的前三个标签:
import xml.etree.ElementTree as ET
users = []
root = ET.fromstring(resp)
for col in root.iterfind('.//row/col[1]'):
users.append(col.text)
print(users)
这是我的代码的输出:
['1043100330', '1043100331', '1043100332']
如果你能帮我一把,请帮忙。非常感谢
解决方案
您需要考虑命名空间以及 xml 的结构。尝试类似:
data = root.findall('.//{C}serviceProviderTable//{C}row')
for datum in data:
entries = datum.findall('.//{C}col')
users.append([entry.text.strip() for entry in entries[:3]])
for user in users:
print(user)
输出:
['1043100330', 'Smith', 'John']
['1043100331', 'Swartz', 'Francis']
['1043100332', 'Laff', 'Michael']
推荐阅读
- ant-media-server - 如何从 ubuntu 19.10 卸载 Ant Media 服务器?
- javascript - 使用 Parse Platform 或 MongoDB 聚合在对象数组中搜索文本作为关键字
- javascript - 尝试订阅测试文件中的 http 调用,我需要在测试中声明可观察者
- angular - Angular 组件未在服务器端呈现
- python - 根据时间选择熊猫数据框日期时间列
- amazon-dynamodb - Fargate 跨账户 DynamoDB 访问
- c++ - 当我们在派生类中编写空的复制构造函数时会发生什么?
- excel - 列 xdf 转置
- swift - 我一直有这个错误,有人可以帮我解决这个问题吗
- go - 使用 sync.Mutex 切片的适当方式