python - How to store a class instance in HDF5
问题描述
TL;DR: Question is in the title. See code snippet.
I need to store pandas.DataFrame
objects in a dictionary-like data structure and to save them to disk. In my current implementation, I'm using a non-nested Python dict
in the form Dict[str, pandas.DataFrame]
and I save all pandas.DataFrame
to disk every minute as csv file. However, these two responsibilities (data storage in memory and to disk) might be elegantly unified using data structures such as HDF5.
One important constraint is that I cannot change the type of what is stored in the pandas.DataFrame
and apparently not all object types can be stored in a HDF5. The reason is that I'm implementing an 3rd party interface with predefined data types which need to be handled in their native form. Mapping instances to different object (e.g. instance to dict
) will require to write an additional layer of logic to map different types of object back and forth (dict
to instance), which is bad.
A similar question with answer here. However, I'm not necessarily interested in querying the stored instances afterwards. In addition, I would ideally keep the amount of extra logic to serialise the instance at its minimum (if needed at all). Data compression is also not a problem. Alternatively, a potential answer could also point to a well established python package which has already encapsulated the logic to store class instances in HDF5 or a similar data model.
import pandas as pd
class C:
def __init__(self, a=0):
self.a = a
def return_42(self):
return self.a
df = pd.DataFrame([C()])
df.dtypes
# 0 object
# dtype: object
store = pd.HDFStore('store1.hdf5')
store.append('c', pd.DataFrame([C()]))
# TypeError: Cannot serialize the column [0] because
# its data contents are [mixed] object dtype.
解决方案
推荐阅读
- node.js - 附加节点获取文件
- c - 删除包含相同对象的单独链接列表时避免双重 free() 错误
- view - Laravel Voyager:如何解决 Route not defined 问题?
- java - Apache Camel,通过 xslt-saxon 合并 XML 文档的问题
- spring-mvc - MissingServletRequestParameterException:必需的产品参数“产品”不存在
- performance - 如何在 Android Studio 中提高屏幕分辨率?
- neo4j - 子图上的 CYPHER 递归计算
- oracle - UTL_SMTP 试图附加多个大于 32k 的文件
- excel - 比较两列,如果另一列等于指定文本,则返回其中一列的值
- json - 无法使用 jolt 转换器将 stringfy json 对象数组转换为 json 对象