首页 > 解决方案 > Python xml.etree.ElementTree.ParseError 从 XML 转换为 CSV

问题描述


由于在我用来下载的页面中插入了一个特殊字符(第二个 DocumentId 块),因此我将 XML 转换为 CSV 的代码存在问题。
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegisterSearch TotalResultsOnPage="500" TotalResults="181686" TotalPages="364" PageSize="500" CurrentPage="164">
    <SearchResults>
        <Document DocumentId="1348828088746657652">
            <Author>SOGETREL</Author>
            <Category>CFA – Convention de facade</Category>
            <DateModified>2021-09-30T13:44:44.997Z</DateModified>
            <Discipline>EXE - Phase Travaux</Discipline>
            <DocumentNumber>52_017_128_EXE_CCR_CFA_S01_010</DocumentNumber>
            <DocumentStatus>1 - Diffusion</DocumentStatus>
            <DocumentType>Convention</DocumentType>
            <FileType>pdf</FileType>
            <Filename>CFA SIGNE -14 Rue Colonel Dubois WASSY - SRO 52-017-128-.pdf</Filename>
            <ReviewStatus>En Workflow</ReviewStatus>
            <Revision>A</Revision>
            <SelectList1>52 - Haute-Marne</SelectList1>
            <SelectList2>017</SelectList2>
            <SelectList3>128</SelectList3>
            <SelectList4>WASSY (52550)</SelectList4>
            <SelectList6>010</SelectList6>
            <Title>CONVENTION FACADE-14-RUE DU LIEUTENANT COLONEL DUBOIS-52130-WASSY</Title>
            <TrackingId>1348828088469026914</TrackingId>
            <Vdrcode>CCR - Concertation</Vdrcode>
        </Document>
        <Document DocumentId="1348828088742847506">
            <Author>SOGETREL</Author>
            <Category>CFA – Convention de facade</Category>
            <DateModified>2021-09-16T12:18:49.690Z</DateModified>
            <Discipline>EXE - Phase Travaux</Discipline>
            <DocumentNumber>52_017_128_EXE_CCR_CFA_S01_010</DocumentNumber>
            <DocumentStatus>1 - Diffusion</DocumentStatus>
            <DocumentType>Convention</DocumentType>
            <FileType>pdf</FileType>
            <Filename>CFA SIGNE -14 Rue Colonel Dubois WASSY - SRO 52-017-128-.pdf</Filename>
            <ReviewStatus>En Workflow</ReviewStatus>
            <Revision>A</Revision>
            <SelectList1>52 - Haute-Marne</SelectList1>
            <SelectList2>017</SelectList2>
            <SelectList3>128</SelectList3>
            <SelectList4>WASSY (52550)</SelectList4>
            <SelectList6>010</SelectList6>
            <Title>CONVENTION FACADE-14-RUE DU LIEUTENANT COLONEL DUBOIS52130-WASSY</Title>
            <TrackingId>1348828088469026914</TrackingId>
            <Vdrcode>CCR - Concertation</Vdrcode>
        </Document>
    </SearchResults>
</RegisterSearch>

它在这里不可见,但您可以使用 Visual Code 看到下面的内容。
错误截图

我尝试用此代码替换字符,但结果保持不变。

import xml.etree.ElementTree as ET

with open("download_xml_164.xml", encoding="utf-8") as f:
  tree = ET.parse(f)
  root = tree.getroot()

  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('DUBOIS52130', 'DUBOIS-52130')
    except AttributeError:
      pass

tree.write("replace_xml_164.xml", encoding="utf-8")

有什么想法或建议可以绕过这个问题吗?
提前致谢 !
斯蒂芬妮

标签: pythonpython-3.xxmlxml-parsingelementtree

解决方案


推荐阅读