首页 > 解决方案 > 数据框 - 加入三个数据框

问题描述

共有三个数据框,详情如下:

first = pd.DataFrame(columns=['id', 'type', 'year'])
second  = pd.DataFrame(columns=['id', 'type', 'year'])
third =  = pd.DataFrame(columns=['id', 'type', 'year'])

和:

first.info()
second.info()
third.info()

输出:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14 entries, 0 to 19
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      14 non-null     int64
 1   type    14 non-null     int64
 2   year    14 non-null     int64
dtypes: int64(3)
memory usage: 448.0 bytes

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 4 to 12
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      2 non-null      int64
 1   type    2 non-null      int64
 2   year    2 non-null      int64
dtypes: int64(3)
memory usage: 64.0 bytes

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 7 to 6
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      4 non-null      int64
 1   type    4 non-null      int64
 2   year    4 non-null      int64
dtypes: int64(3)
memory usage: 128.0 bytes

我想将这三个数据框组合起来,并将它们输出为具有以下结构的 json 文件:

{
 "first": {
           "1": [[149], [42]], 
           "21": [[101], [234]], 
           ...
          },

"second": {
             "14": [[159], [425]], 
             "5": [[1051], [5234]], 
             ...
            }, 
 "third": {
          "6": [[3], [4443]], 
          "77": [[65], [4]], 
          ...
         }
}       

我尝试了以下并得到了错误。

all = {first, second, third}

with open('output.json', 'w') as fp:
    json.dump(all, fp)

它给出了损坏的输出,并出现以下错误:

TypeError: Object of type Series is not JSON serializable

任何帮助将不胜感激。

标签: pythonjsondataframe

解决方案


问题是您正在尝试 JSON 序列化数据框对象的字典。您必须先将数据框对象转换为所需格式的字典,然后才能完成json.dump

def convert2dict(df):
    # This function will convert each of your data frames into a dictionary of the desired form
    return {str(r['id']):[[r['type']],[r['year']]] for r in df.to_dict('records')}

# Place them all into one large dictionary
all_dfs = {
    "first": convert2dict(first),
    "second": convert2dict(second),
    "third": convert2dict(third)
}

with open('output.json', 'w') as fp: 
    json.dump(all_dfs, fp) 

推荐阅读