首页 > 解决方案 > 保存和导出 python pandas 数据框的 dtypes 信息

问题描述

我有一个名为 df 的 pandas DataFrame。我可以在df.dtypes屏幕上打印:

arrival_time      object
departure_time    object
drop_off_type      int64
extra             object
pickup_type        int64
stop_headsign     object
stop_id           object
stop_sequence      int64
trip_id           object
dtype: object

我想保存这些信息,以便我可以将它与其他数据进行比较,在其他地方进行类型转换等。我想将它保存到本地文件中,然后在另一个程序中数据不能去的其他地方恢复它。但我无法弄清楚如何。显示各种转换的结果。

df.dtypes.to_dict()
{'arrival_time': dtype('O'),
 'departure_time': dtype('O'),
 'drop_off_type': dtype('int64'),
 'extra': dtype('O'),
 'pickup_type': dtype('int64'),
 'stop_headsign': dtype('O'),
 'stop_id': dtype('O'),
 'stop_sequence': dtype('int64'),
 'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
...
TypeError: dtype('O') is not JSON serializable

----
list(xdf.dtypes)
[dtype('O'),
 dtype('O'),
 dtype('int64'),
 dtype('O'),
 dtype('int64'),
 dtype('O'),
 dtype('O'),
 dtype('int64'),
 dtype('O')]

如何保存和导出/归档 pandas DataFrame 的 dtype 信息?

标签: pythonjsonpandasdataframeseries

解决方案


pd.DataFrame.dtypes返回一个pd.Series对象。这意味着您可以像操作 Pandas 中的任何常规系列一样操作它:

df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})

res = df.dtypes.to_frame('dtypes').reset_index()

print(res)

  index   dtypes
0     A   object
1     B  float64
2     C    int64
3     D     bool

输出到 csv / excel / pickle

然后,您可以使用通常用于存储数据帧的任何方法,例如to_csvto_excel、等。to_pickle建议分发 pickle 的注意事项,因为它取决于版本。

输出到 json

如果您希望以字典的形式轻松存储和加载,一种流行的格式是json. 如您所见,您需要先转换为str类型:

import json

# first create dictionary
d = res.set_index('index')['dtypes'].astype(str).to_dict()

with open('types.json', 'w') as f:
    json.dump(d, f)

with open('types.json', 'r') as f:
    data_types = json.load(f)

print(data_types)

{'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}

推荐阅读