首页 > 解决方案 > 根据特定值从 XML 创建多个数据框

问题描述

我正在尝试解析 XML 并将结果保存在 Pandas Data-frame 中。我已成功将详细信息保存在一个特定的数据框中。但是,现在我试图根据一个特定的类值将结果保存在多个数据框中。

import pandas as pd
import xml.etree.ElementTree as ET
import os
from collections import defaultdict, OrderedDict

tree = ET.parse('PowerChange_76.xml')
root = tree.getroot()
df_list = []

for i, child in enumerate(root): 
    for subchildren in child.findall('{raml20.xsd}header'):
        for subchildren in child.findall('{raml20.xsd}managedObject'):
            match_found = 0
            xml_class_name = subchildren.get('class')
            xml_dist_name = subchildren.get('distName')
            print(xml_class_name)

            df_dict = OrderedDict()                   
            for subchild in subchildren:
                header = subchild.attrib.get('name')
                df_dict['Class'] = xml_class_name
                df_dict['CellDN'] = xml_dist_name
                df_dict[header]=subchild.text

            df_list.append(df_dict)
df_cm = pd.DataFrame(df_list) 

预期结果是根据“类”的数量创建多个数据框。

电流输出:

在此处输入图像描述

XML 文件

标签: python-3.xpandaselementtree

解决方案


这是通过以下方法回答的:

def ExtractMOParam(xmlfile2):
tree2=etree.parse(xmlfile2)
root2=tree2.getroot()
df_list2=[]
for i, child in enumerate(root2):
    for subchildren in (child.findall('{raml21.xsd}header') or child.findall('{raml20.xsd}header')):
        for subchildren in (child.findall('{raml21.xsd}managedObject') or child.findall('{raml20.xsd}managedObject')):
            xml_class_name2 = subchildren.get('class')
            xml_dist_name2 = subchildren.get('distName')
            if ((xml_class_name2 in GetMOClass) and (xml_dist_name2 in GetCellDN)):
                #xml_dist_name2 = subchildren.get('distName')
                #df_list1.append(xml_class_name1)
                for subchild in subchildren:
                    df_dict2=OrderedDict()
                    header2=subchild.attrib.get('name')
                    df_dict2['MOClass']=xml_class_name2
                    df_dict2['CellDN']=xml_dist_name2
                    df_dict2['Parameter']=header2
                    df_dict2['CurrentValue']=subchild.text
                    df_list2.append(df_dict2)
        return df_list2

ExtractDump=pd.DataFrame(ExtractMOParam(inputdfile))

d = dict(tuple(ExtractDump.groupby('MOClass')))

for key in d:
    d[key]=d[key].reset_index().groupby(['CellDN','MOClass','Parameter'])['CurrentValue'].aggregate('first').unstack()
    d[key].reset_index(level=0, inplace=True)
    d[key].reset_index(level=0, inplace=True)

writer = pd.ExcelWriter('ExtractedDump.xlsx', engine='xlsxwriter')
for tab_name, dframe in d.items():
    dframe.to_excel(writer, sheet_name=tab_name,index=False)
writer.save() 

希望这对其他人也有帮助。


推荐阅读