首页 > 解决方案 > Converting a sub OrderedDict to a DataFrame

问题描述

I am using Jupyter and have accessed data from Airtable's API. It is now stored as multiple OrderedDict's. I need to convert this data into seperate dataframes.

OrderedDict([('records',
                  [OrderedDict([('id', 'rec0O8L1dlrobrPtj'),
                                ('fields', OrderedDict()),
                                ('createdTime', '2018-05-18T05:36:54.000Z')]),
                   OrderedDict([('id', 'rec13WqEutT0SwIP0'),
                                ('fields',
                                 OrderedDict([('Lead ID', '64556'),
                                              ('Company Name',
                                               'CesKath (Ukay-Ukay) / KRKK Online Shop'),
                                              ('Client Name',
                                               'Kamille Rona Venturina Taytay'),
                                              ('Principal Defendant Name/s',
                                               'n/a'),
                                              ('Co-Defendant Name/s', 'n/a'),
                                              ('Plaintiff', 'n/a'),
                                              ('Nature of Case', 'n/a'),
                                              ('Trial Court', 'n/a'),
                                              ('City/Province', 'n/a'),
                                              ('Sala No.', 'n/a'),
                                              ('Case Number', 'n/a'),
                                              ('Case Status', 'n/a'),
                                              ('Address', 'n/a')])),

I have tried the following code which converts everything to a single dataframe.

df = pd.DataFrame.from_dict(data)     

When I execute this code it produces the following:

     records                     offset
0   {'id': 'rec0O8L1dlrobrPtj', itr67AuLTHCfW40zH/recblaoEXrMrbx7Yt
1   {'id': 'rec13WqEutT0SwIP0', itr67AuLTHCfW40zH/recblaoEXrMrbx7Yt
2   {'id': 'rec22sGXgPU9hFbTq', itr67AuLTHCfW40zH/recblaoEXrMrbx7Yt
3   {'id': 'rec2a4MQL24dQhGzI', itr67AuLTHCfW40zH/recblaoEXrMrbx7Yt
4   {'id': 'rec3VBhG7u55BQsFy', itr67AuLTHCfW40zH/recblaoEXrMrbx7Yt

I need to access the OrderedDict in the third indent (i.e.

                                              ('Lead ID', '64556'),
                                              ('Company Name',
                                               'CesKath (Ukay-Ukay) / KRKK Online Shop'),
                                              ('Client Name',
                                               'Kamille Rona Venturina Taytay'),
                                              ('Principal Defendant Name/s',
                                               'n/a'),
                                              ('Co-Defendant Name/s', 'n/a'),
                                              ('Plaintiff', 'n/a'),
                                              ('Nature of Case', 'n/a'),
                                              ('Trial Court', 'n/a'),
                                              ('City/Province', 'n/a'),
                                              ('Sala No.', 'n/a'),
                                              ('Case Number', 'n/a'),
                                              ('Case Status', 'n/a'),
                                              ('Address', 'n/a')])),

How exactly can I access the sub-OrderedDict and convert it to a dataframe?

标签: pythondataframeordereddictionary

解决方案


这是一种方法。

演示:

from collections import OrderedDict
import pandas as pd

data = OrderedDict([('records',
                  [OrderedDict([('id', 'rec0O8L1dlrobrPtj'),
                                ('fields', OrderedDict()),
                                ('createdTime', '2018-05-18T05:36:54.000Z')]),
                   OrderedDict([('id', 'rec13WqEutT0SwIP0'),
                                ('fields',
                                 OrderedDict([('Lead ID', '64556'),
                                              ('Company Name',
                                               'CesKath (Ukay-Ukay) / KRKK Online Shop'),
                                              ('Client Name',
                                               'Kamille Rona Venturina Taytay'),
                                              ('Principal Defendant Name/s',
                                               'n/a'),
                                              ('Co-Defendant Name/s', 'n/a'),
                                              ('Plaintiff', 'n/a'),
                                              ('Nature of Case', 'n/a'),
                                              ('Trial Court', 'n/a'),
                                              ('City/Province', 'n/a'),
                                              ('Sala No.', 'n/a'),
                                              ('Case Number', 'n/a'),
                                              ('Case Status', 'n/a'),
                                              ('Address', 'n/a')]))])
                   ]
              )])

df = pd.DataFrame([d["fields"] for d in data["records"]])
print(df)

输出:

  Lead ID                            Company Name  \
0     NaN                                     NaN   
1   64556  CesKath (Ukay-Ukay) / KRKK Online Shop   

                     Client Name Principal Defendant Name/s  \
0                            NaN                        NaN   
1  Kamille Rona Venturina Taytay                        n/a   

  Co-Defendant Name/s Plaintiff Nature of Case Trial Court City/Province  \
0                 NaN       NaN            NaN         NaN           NaN   
1                 n/a       n/a            n/a         n/a           n/a   

  Sala No. Case Number Case Status Address  
0      NaN         NaN         NaN     NaN  
1      n/a         n/a         n/a     n/a  

推荐阅读