首页 > 解决方案 > 使用 Python 和元素树查找和替换标签内的 XML 数据


首先,我对 python 很陌生,知道的很少。然而,我的任务是制作这个程序,所以我很感谢你的帮助。

我需要匿名化 XML 文件中的数据。这将包括将多个标签更改为 NULL。

我首先尝试使用带有元素树的 python 来替换 DateOfBirth 数据。我需要将出生日期标签替换为 NULL

这是一个 XML 文件的片段,其中包含学习者的 MOCK 数据之一。这包括 1 个学习者,通常会有 1-1000 个学习者,并且所有值都需要在整个过程中更改为 NULL。

<?xml version="1.0" encoding="UTF-8"?>
<!-- Please note that this file is properly formed, and serves as an example of a file that will load into the ILR DC system.  The data is anonymised and does not refer to a real-world provider, learning delivery or learner.  Based on the ILR specification, version 2, dated April 2018-->
<Message xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="ESFA/ILR/2018-19" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ESFA/ILR/2018-19">
            <!-- This and the next element only appear in files generated by FIS -->
            <ReferenceData>Version5.0, LARS 2017-08-01</ReferenceData>
        <!-- The SourceFiles group only appears in files generated by FIS -->
            <SoftwareSupplier>Software Systems Inc.</SoftwareSupplier>
    <!-- 16 yr old learner undertaking full time 16-19 (excluding apprenticeships) funded programme -->
        <PostcodePrior>BR1 7SS</PostcodePrior>
        <Postcode>BR1 7SS</Postcode>
        <AddLine1>The Street</AddLine1>
        <!-- Employment status record is not required for full time 16-19 (excluding apprenticeships) funded learners  -->
        <!-- 16-19  (excluding apprenticeships) funded study programme -->
            <DelLocPostCode>BR1 3RL</DelLocPostCode>
            <DelLocPostCode>BR2 7UP</DelLocPostCode>


import os 
from xml.etree import ElementTree as et 

base_path  = os.path.dirname(os.path.realpath(__file__))

xml_file = os.path.join(base_path, "ILR_mock_data.xml") 

tree = et.parse(xml_file) 

# root = tree.getroot()

# for child in root:
#     print(child.tag, child.attrib)

#for child in root:
#    for element in child:
#        print(element.tag, ":", element.text)

tree.find('Learner/DateOfBirth').text = 'NULL'



 Traceback (most recent call last):
  File "C:/Users/jkay/Desktop/Anon Tool RCU/RCU MOCK TOOL (Anonamising).py", line 20, in <module>
    tree.find('Learner/DateOfBirth').text = 'NULL'
AttributeError: 'NoneType' object has no attribute 'text'

我希望程序运行 XML 文件并返回一个新文件,其中所有出生日期都替换为 NULL


标签: pythonxmltreeelement


Beautiful Soup看起来像您在这里寻找的解决方案。这是一个专门为解析 HTML 和 XML 文件而构建的库(尽管您可能还必须安装一些解析器.


from bs4 import BeautifulSoup

with open("my_file.xml", "r") as infile:
    xml_text = infile.read()

soup = BeautifulSoup(xml_text, 'xml')

# replace all DateOfBirth tag contents with NULL
for dob_tag in soup.find_all("DateOfBirth"):
    dob_tag.string = "NULL"

# output and save modified file
with open("my_file_edited.xml", "w") as outfile:

