python - 保存和导出 python pandas 数据框的 dtypes 信息
问题描述
我有一个名为 df 的 pandas DataFrame。我可以在df.dtypes
屏幕上打印:
arrival_time object
departure_time object
drop_off_type int64
extra object
pickup_type int64
stop_headsign object
stop_id object
stop_sequence int64
trip_id object
dtype: object
我想保存这些信息,以便我可以将它与其他数据进行比较,在其他地方进行类型转换等。我想将它保存到本地文件中,然后在另一个程序中数据不能去的其他地方恢复它。但我无法弄清楚如何。显示各种转换的结果。
df.dtypes.to_dict()
{'arrival_time': dtype('O'),
'departure_time': dtype('O'),
'drop_off_type': dtype('int64'),
'extra': dtype('O'),
'pickup_type': dtype('int64'),
'stop_headsign': dtype('O'),
'stop_id': dtype('O'),
'stop_sequence': dtype('int64'),
'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
...
TypeError: dtype('O') is not JSON serializable
----
list(xdf.dtypes)
[dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O')]
如何保存和导出/归档 pandas DataFrame 的 dtype 信息?
解决方案
pd.DataFrame.dtypes
返回一个pd.Series
对象。这意味着您可以像操作 Pandas 中的任何常规系列一样操作它:
df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})
res = df.dtypes.to_frame('dtypes').reset_index()
print(res)
index dtypes
0 A object
1 B float64
2 C int64
3 D bool
输出到 csv / excel / pickle
然后,您可以使用通常用于存储数据帧的任何方法,例如to_csv
、to_excel
、等。不to_pickle
建议分发 pickle 的注意事项,因为它取决于版本。
输出到 json
如果您希望以字典的形式轻松存储和加载,一种流行的格式是json
. 如您所见,您需要先转换为str
类型:
import json
# first create dictionary
d = res.set_index('index')['dtypes'].astype(str).to_dict()
with open('types.json', 'w') as f:
json.dump(d, f)
with open('types.json', 'r') as f:
data_types = json.load(f)
print(data_types)
{'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}
推荐阅读
- scrapy - scrapy Pipeline TypeError:只能将str(不是“dict”)连接到str
- awk - 在文件的每一行中替换不匹配模式的字符串
- angular - 为什么我的一个过滤管没有显示结果?
- postgresql - 我应该使用 Firebase UID 作为数据库中 User 表的主键吗?
- c++ - c++ 进程以退出代码完成
- postgresql - 如何从 Postgresql 中的最大值 3 中的最后两位数中获取最大值?
- python-2.7 - 在尝试除外期间未附加列表
- r - ggplot2 y 值作为“行数”?
- javascript - 传递一个对象方法并在函数之外调用这些方法。随机密码生成器
- regex - 正则表达式可选地提取两个字符之间的字符