首页 > 解决方案 > 使用 xml.etree.ElementTree 从 XML 字符串中提取数据

问题描述

我有一个 XML 字符串,我需要提取每组“行”中的前三个“col”标签。换句话说,输出应该如下:

['1043100330', 'Smith', 'John', '1043100331', 'Swartz', 'Francis', '1043100332', 'Laff', 'Michael']

这是 XML:

data = '''<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Body>
      <processOXIMessageResponse xmlns="urn:com:singun:webservice" xmlns:ns="urn:com:singun:webservice">
         <ns1:processOXIMessageReturn xmlns:ns1="urn:com:singun:webservice">
            <SingunDocument xmlns="C" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" protocol="OXI">
               <sessionId xmlns="">1613762599483</sessionId>
               <command xmlns="" echo="" xsi:type="ServiceProviderGetListResponse">
                  <serviceProviderTable>
                     <colHeading>User Id</colHeading>
                     <colHeading>Last Name</colHeading>
                     <colHeading>First Name</colHeading>
                     <colHeading>Email Address</colHeading>
                     <colHeading>Phone Number</colHeading>
                     <colHeading>Extension</colHeading>
                     <colHeading>Country Code</colHeading>
                     <colHeading>National Prefix</colHeading>
                     <row>
                        <col>1043100330</col>
                        <col>Smith</col>
                        <col>John</col>
                        <col>jsmith@example.com</col>
                        <col>1043101330</col>
                        <col>1330</col>
                        <col>52</col>
                        <col>52</col>
                     </row>
                     <row>
                        <col>1043100331</col>
                        <col>Swartz</col>
                        <col>Francis</col>
                        <col>fswartz@example.com</col>
                        <col>1043101331</col>
                        <col>1331</col>
                        <col>52</col>
                        <col>52</col>
                     </row>
                     <row>
                     <row>
                        <col>1043100332</col>
                        <col>Laff</col>
                        <col>Michael</col>
                        <col>mlaff@example.com</col>
                        <col>1043101332</col>
                        <col>1332</col>
                        <col>52</col>
                        <col>52</col>
                     </row>
                  </serviceProviderTable>
               </command>
            </SingunDocument>
         </ns1:processOXIMessageReturn>
      </processOXIMessageResponse>
   </soapenv:Body>
</soapenv:Envelope>

以下是我的代码,但它只提取第一个“col”标签,而不是我想要的每组的前三个标签:

        import xml.etree.ElementTree as ET

        users = []
        root = ET.fromstring(resp)
        for col in root.iterfind('.//row/col[1]'):
            users.append(col.text)
        print(users)

这是我的代码的输出:

['1043100330', '1043100331', '1043100332']

如果你能帮我一把,请帮忙。非常感谢

标签: python-3.xxml

解决方案


您需要考虑命名空间以及 xml 的结构。尝试类似:

data = root.findall('.//{C}serviceProviderTable//{C}row')
for datum in data:
    entries = datum.findall('.//{C}col')
    users.append([entry.text.strip() for entry in entries[:3]])
for user in users:
    print(user)

输出:

['1043100330', 'Smith', 'John']
['1043100331', 'Swartz', 'Francis']
['1043100332', 'Laff', 'Michael']

推荐阅读