首页 > 解决方案 > 从列表格式的嵌套字典中获取键和值到数据框

问题描述

我有非常嵌套的字典列表。我正在尝试从特定的嵌套字典中捕获“键”并将其转换为数据框。我该怎么做呢?我有基本的字典知识来生成密钥,我尝试过追加[]{}但效果不佳。任何指导表示赞赏!

import pandas as pd
from pprint import pprint

d = {'Main':{
            'SecondLevel':
                    [{'Identifier':'abc',
                     'StudentInfo':{'Name':'Mike','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Paul'},
                                                        {'Name':'Smith'}
                                                       ]}},
                    {
                     'StudentInfo':{'Name':'Mandy','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Baker'},
                                                        {'Name':'Smith'}
                                                       ]}}]}}
pprint(d)

list_dict = []
for doc in d['Main']['SecondLevel']:
    identifier = '' if doc.get('Identifier') is None else doc['Identifier']
    studentname = doc['StudentInfo']['Name']

    list_dict.append(identifier)
    list_dict.append(studentname)

    for teach in doc['StudentInfo']['TeachersAssigned']:
        teachers_name = teach['Name']

        list_dict.append(teachers_name)

pprint(list_dict)

>>> ['abc', 'Mike', 'Paul', 'Smith', '', 'Mandy', 'Baker', 'Smith']

pd.DataFrame(list_dict)
>>> single column with list of the values from above

我试图让它像这样:

Identifier   StudentInfo    TeachersAssigned
abc          Mike           Paul
abc          Mike           Smith
             Mandy          Baker
             Mandy          Smith

我是否为列表理解做错了 for 循环?

标签: pythonjsonpython-3.xpandasdictionary

解决方案


鉴于您的字典,这就是我的管理方式。但正如我之前解释的,你不能在 DataFrame 中有不同长度的列,因此你可以使用np.nan

import pandas as pd
import numpy as np
import pandas as pd
d = {'Main':{
            'SecondLevel':
                    [{'Identifier':'abc',
                     'StudentInfo':{'Name':'Mike','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Paul'},
                                                        {'Name':'Smith'}
                                                       ]}},
                    {
                     'StudentInfo':{'Name':'Mandy','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Baker'},
                                                        {'Name':'Smith'}
                                                       ]}}]}}
data = {'Identifier':[],'Name':[],'TeachersAssigned':[]}
for i in range(len(d['Main']['SecondLevel'])):
    for j in range(len(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'])):
        try: 
            data['Identifier'].append(d['Main']['SecondLevel'][i]['Identifier'])
        except KeyError:
            data['Identifier'].append(np.nan)
        data['Name'].append(d['Main']['SecondLevel'][i]['StudentInfo']['Name'])
        data['TeachersAssigned'].append(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'][j]['Name'])
df = pd.DataFrame(data)
print(df)

输出:

Identifier   Name TeachersAssigned
0        abc   Mike             Paul
1        abc   Mike            Smith
2        NaN  Mandy            Baker
3        NaN  Mandy            Smith

推荐阅读