首页 > 解决方案 > 将空镶木地板文件读取为 pandas DataFrame 时出错

问题描述

感谢@TDrabas 的建议,我对我的问题进行了更多调查。更新了我的问题如下:

我有一个如下的数据框:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame([[1]])
df.index = pd.MultiIndex.from_tuples([('a','b','c')], names=["column1", "column2", "column3"])
df.columns = pd.MultiIndex.from_tuples([('a','b','c')], names=["column1", "column2", "column3"])

# narrow down to empty df
df = df.loc[[], []]
table = pa.Table.from_pandas(df)
pq.write_table(table, 'my_table.parquet')

# the following breaks
pd.read_parquet('my_table.parquet')

# the following works fine
table=pq.read_table('my_table.parquet')
table
# pyarrow.Table
# column1: null
# column2: null
# column3: null
# metadata
table.shape
# (0, 3)

pd.read_parquet给出错误

  File "lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 1071, in <listcomp>
    for level, col_index in zip_longest(
AttributeError: 'dict' object has no attribute 'dtype'

我找到了一个解决方案来解决这个问题,但还有比这更优雅的方法吗?

df = df.loc[[]] # instead of df.loc[[], []]

标签: pythonpandasparquetpyarrow

解决方案


推荐阅读