首页 > 解决方案 > 如何使用python根据特定条件添加xml标签

问题描述

示例 XML 文件

<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. abc@gmail.com</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. ghi@yahoo.co.in</Affiliation>
        <Keywords>-</Keywords>
    </Article>
</ArticleSet>

示例代码

from xml.etree import ElementTree as etree
import re

root = etree.parse("sampleinput.xml").getroot()

for article in root.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        c = etree.Element("<Email>")
        c.text = email.group(0)
        etree.write(article,c)

输出所需更新的 XML 文件

<?xml version="1.0"?>
<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. abc@gmail.com</Affiliation>
        <Keywords>-</Keywords>
        <Email>abc@gmail.com</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
        <Email>-</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. ghi@yahoo.co.in</Affiliation>
        <Keywords>-</Keywords>
        <Email>ghi@yahoo.co.in</Email>
    </Article>
</ArticleSet>

我想从<Affiliation>标签中提取电子邮件地址并创建一个名为的新标签<Email>并将提取的电子邮件存储到该标签中。如果<Affiliation>等于-则存储<Email>-</Email>到该文章中。

错误

回溯(最后一次调用):文件“C:/Users/Ghost Rider/Documents/Python/addingTagsToXML.py”,第 11 行,在 etree.write(article,c) AttributeError: module 'xml.etree.ElementTree' has没有属性“写”

标签: python

解决方案


你可以试试这个:

import re
import xml
tree = xml.etree.ElementTree.parse('filename.xml')
e = tree.getroot()

for article in e.findall('Article'):
    child = xml.etree.ElementTree.Element("Email")
    if article[2].text != '-':
        email = re.search(r'[\w\.-]+@[\w\.-]+', article[2].text).group()
        child.text = email
    else:
        child.text = ' - '
    article.insert(4,child)
tree.write("filename.xml")

推荐阅读