首页 > 解决方案 > Pandas DataFrame - 行到列的字典

问题描述

DataFrame 的来源是一个字典列表,例如 -
ls = [{'fileName': 'file_01', 'col1': {'key1': 'value1a', 'key2': 'value1b'}}, {'fileName': 'file_02', 'col1': {'key1': 'value2a', 'key2': 'value2b', 'key3':'value2c'}}, {'fileName': 'file_03', 'col1': {'key1': 'value3a', 'key3': 'value3c'}}]

DataFrame 创建为
df = pd.DataFrame(ls, columns=['fileName', 'col1'])

Pandas DataFramedf看起来像 -

fileName     col1 
file_01      {'key1':value1a, 'key2':value1b}
file_02      {'key1':value2a, 'key2':value2b, 'key3':value2c}
file_03      {'key1':value3a, 'key3':value3c}

我怎样才能把它转换成这样 -

fileName     key1      key2      key3
file_01      value1a   value1b 
file_02      value2a   value2b   value2c
file_03      value3a             value3c

我试过了 -
df = pd.concat([df['fileName'], pd.get_dummies(df['col1'].apply(pd.Series))], axis=1)

我在某些情况下看到了结果,例如 -

fileName     key1_value1a     key1_value2a     key1_value3a
file_01      value1a           
file_02                       value2a   
file_03                                        value3a            

标签: pythonpandasdictionary

解决方案


使用pd.json_normalize()

In [40]: pd.concat([df['fileName'], pd.json_normalize(df['col1'])],axis=1)      
Out[40]: 
   fileName     key1     key2     key3
0   file_01  value1a  value1b      NaN
1   file_02  value2a  value2b  value2c
2   file_03  value3a      NaN  value3c

推荐阅读