首页 > 解决方案 > 如何构建表示 XML 文件中所有节点和数据的 Python 字典?

问题描述

我的 testxml 文件:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE PPP
  SYSTEM 'PPP.DTD'>
<book chg="R" model="AB" >
    <chapter chapnbr="09" chg="U" key="EN49" >
        <effect effrg="Afcd"/>
        <title>HOW TO WIN</title>
        <section chapnbr="09" chg="U" key="Edff" revdate="20100701" sectnbr="102">
            <title>What a start</title>
            <subject chapnbr="09" chg="U" key="Edff" revdate="20100701" sectnbr="102" subjnbr="00">
                <title>1.A</title>
                <pgblk chapnbr="09" chg="U" confnbr="00" key="Edff00" pgblknbr="00" revdate="20200701" sectnbr="102" subjnbr="00">
                    <effect effrg="12"/>
                    <title>1.A.i) Plan Ahead for the worst</title>
                    <prclist1>
                        <prcitem1 adns-numbering="8" adns-title="learning my way with help of good people" >
                            <effect effrg="Edff"/>
                            <prcitem asFragment="true">
                                <title>1.A.i) Plan Ahead for the worst</title>
                                <para>It was a cold January night, and I had too much whisky. 
                                    <refblock>
                                        09-102-00
                                        <refint rrr="22,445,555,555,555" refid="Edff0898">
                                            <effect effrg="Edff0899"/>
                                            0910200</refint>
                                    </refblock>. </para>
                                <para>In more usual circumstances, I possesed the self-control. Not this time 
                                    <refblock>
                                        09-102-00-1111
                                        <refint rrr="sdf,2323,2323" refid="Edff123">
                                            <effect effrg="Edff12434"/>
                                            09-102-00</refint>
                                    </refblock>. </para>
                            </prcitem>
                        </prcitem1>
                    </prclist1>
                </pgblk>
            </subject>
        </section>
    </chapter>
</book>

我正在尝试构建所有元素的字典(因此我可以使用它来构建 Pandas DataFrame):

要解析 xml 并获取根目录:

import xml.etree.ElementTree as ET

parsed = ET.parse(open('my_testxml.xml'))
root = parsed.getroot()

要将 xml 文件的所有节点放入列表中:

all_Nodes = list(root.iter())

其中,all_Nodes 的输出: 在此处输入图像描述

然后开始在字典中获取输出:


for i,elem in enumerate(all_Nodes):
    
    d={}
    d[all_Nodes[i].tag] = all_Nodes[i].text

我原以为上面会输出元素标签的键和相应的文本值。我也可以尝试为属性添加一行

但是我得到的唯一结果是

d = {'效果':无}

我试过 xmltodict,但它一直给我一个语法错误

标签: pythonxmlpandasxml-parsingelementtree

解决方案


好的,让它工作


for i,elem in enumerate(all_Nodes):

    l_tags=[]
    l_text=[]
    for i in range(len(all_Nodes)):
        l_tags.append(all_Nodes[i].tag)
        l_text.append(all_Nodes[i].text)
    
 
    final = list(zip(l_tags,l_text))

推荐阅读