首页 > 解决方案 > Python3遍历可变长度的XML文件

问题描述

我有一个非常嵌套的 XML 文件,我必须对其进行迭代以提取记录。我已经按照一些示例来阅读 XML,并且我确信 XML 是固定长度的,但经过一些提取后我发现它不是。这是我的代码:

import xml.etree.ElementTree as ET
tree = ET.parse('EcommProdotti.xml')
root = tree.getroot()
print("Printing on file...")

with open("prodotti.txt", "w") as f:
for child in root:
    for element in child.iter('Products'):
        for sub_element in element.iter('Product'):
            length = len(sub_element) + 1    
            my_string = sub_element[1].text + " " + sub_element[2].text + " " + sub_element[9].text + "\n"
            f.write(my_string)

如您所见,我的记录位于 sub_element 节点中,这可能是可变的,根据以下 XML 文件示例:

<?xml version="1.0" encoding="UTF-8"?>
<!-- File in formato Easyfatt-XML creato con Danea Easyfatt - www.danea.it/software/easyfatt -->
<!-- Per importare o creare un file in formato Easyfatt-Xml, consultare la documentazione tecnica: www.danea.it/software/easyfatt/xml -->
<EasyfattProducts AppVersion="2" Creator="Danea Easyfatt Enterprise One  2019.45d" CreatorUrl="http://www.danea.it/software/easyfatt" Mode="full" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://www.danea.it/public/prodotti.xsd">
  <Products>
    <Product>
      <InternalID>35</InternalID>
      <Code>00035</Code>
      <Description>12 PEZZI ROTOLO SACCHETTO IGIENICO PER CANI</Description>
      <DescriptionHtml></DescriptionHtml>
      <Category>ACCESSORI PER ANIMALI</Category>
      <Subcategory>PRODOTTI IGIENE E PULIZIA</Subcategory>
      <Vat Perc="22" Class="Imponibile" Description="Imponibile 22%">22</Vat>
      <Um>PZ</Um>
      <NetPrice1>2.46</NetPrice1>
      <GrossPrice1>3</GrossPrice1>
      <Barcode>8805786177364</Barcode>
      <SupplierCode>0004</SupplierCode>
      <SupplierName>LED STORM SRLS</SupplierName>
      <SupplierNetPrice>1.898</SupplierNetPrice>
      <SupplierGrossPrice>2.3156</SupplierGrossPrice>
      <SizeUm>cm</SizeUm>
      <WeightUm>kg</WeightUm>
      <GrossWeight>0.2</GrossWeight>
      <ManageWarehouse>true</ManageWarehouse>
      <OrderWaitDays>10</OrderWaitDays>
      <AvailableQty>6</AvailableQty>
      <Notes></Notes>
    </Product>
    <Product>
      <InternalID>1155</InternalID>
      <Code>01144</Code>
      <Description>ADVANCE CANE ATOPIC MEDIUM/MAXI 3 KG</Description>
      <DescriptionHtml></DescriptionHtml>
      <Category>ALIMENTI PER ANIMALI</Category>
      <Subcategory>ALIMENTI CURATIVI (DIETETICI)</Subcategory>
      <Vat Perc="22" Class="Imponibile" Description="Imponibile 22%">22</Vat>
      <Um>PZ</Um>
      <NetPrice1>20.48</NetPrice1>
      <GrossPrice1>24.99</GrossPrice1>
      <Barcode>8410650170695</Barcode>
      <ProducerName>Affinity</ProducerName>
      <SupplierCode>0033</SupplierCode>
      <SupplierName>LOCONTE VITO &amp; C. S.A.S.</SupplierName>
      <SupplierProductCode>ADV924483</SupplierProductCode>
      <SupplierNetPrice>13.0438</SupplierNetPrice>
      <SupplierGrossPrice>15.9134</SupplierGrossPrice>
      <SizeUm>cm</SizeUm>
      <WeightUm>kg</WeightUm>
      <GrossWeight>3</GrossWeight>
      <ManageWarehouse>true</ManageWarehouse>
      <AvailableQty>8</AvailableQty>
      <Notes></Notes>
    </Product>
    <Product>
      <InternalID>203</InternalID>
      <Code>00198</Code>
      <Description>ADVANTIX CANE FINO A 4 KG</Description>
      <DescriptionHtml></DescriptionHtml>
      <Category>ACCESSORI PER ANIMALI</Category>
      <Subcategory>ANTIPARASSITARI</Subcategory>
      <Vat Perc="10" Class="Imponibile" Description="Imponibile 10%">10</Vat>
      <Um>PZ</Um>
      <NetPrice1>19.82</NetPrice1>
      <GrossPrice1>21.8</GrossPrice1>
      <Barcode>4007221009597</Barcode>
      <ProducerName>Bayer</ProducerName>
      <SupplierCode>0033</SupplierCode>
      <SupplierName>LOCONTE VITO &amp; C. S.A.S.</SupplierName>
      <SupplierProductCode>BYR03382048</SupplierProductCode>
      <SupplierNetPrice>16.25</SupplierNetPrice>
      <SupplierGrossPrice>17.875</SupplierGrossPrice>
      <SizeUm>cm</SizeUm>
      <WeightUm>kg</WeightUm>
      <GrossWeight>0.04</GrossWeight>
      <ManageWarehouse>true</ManageWarehouse>
      <OrderWaitDays>10</OrderWaitDays>
      <AvailableQty>10</AvailableQty>
      <Notes></Notes>
      <ExtraBarcodes>
        <Barcode>103629046</Barcode>
        <Barcode>4007221046424</Barcode>
      </ExtraBarcodes>
    </Product>
</EasyfattProducts>

那么,我该如何遍历这个文件呢?提前感谢您的时间和帮助

标签: python-3.xpython-3.6

解决方案


您可以使用XPath查询来避免直接项索引:

import xml.etree.ElementTree as ET
tree = ET.parse('EcommProdotti.xml')
root = tree.getroot()
for product in root.findall(".//Products/Product"):
    for field in ['Code', 'Description', 'GrossPrice1','SupplierProductCode']:
        value = product.find(field)
        if  value != None:
            print (value.text, end=' ')
        else:
            print ('Not defined', end=' ')
    print()

推荐阅读