首页 > 解决方案 > Unexpected result in pandas pivot_table

问题描述

I am trying to do a pivot_table on a pandas Dataframe. I am almost getting the expected result, but it seems to be multiplied by two. I could just divide by two and call it a day, however, I want to know whether I am doing something wrong.

Here goes the code:

import pandas as pd
import numpy as np

df = pd.DataFrame(data={"IND":[1,2,3,4,5,1,5,5],"DATA":[2,3,4,2,10,4,3,3]})
df_pvt = pd.pivot_table(df, aggfunc=np.size, index=["IND"], columns="DATA")

df_pvt is now:

DATA   2    3    4    10
IND                     
1     2.0  NaN  2.0  NaN
2     NaN  2.0  NaN  NaN
3     NaN  NaN  2.0  NaN
4     2.0  NaN  NaN  NaN
5     NaN  4.0  NaN  2.0

However, instead of the 2.0 is should be 1.0! What am I misunderstanding / doing wrong?

标签: pythonpandasnumpydataframepivot-table

解决方案


Use the string 'size' instead. This will trigger the Pandas interpretation of "size", i.e. the number of elements in a group. The NumPy interpretation of size is the product of the lengths of each dimension.

df = pd.pivot_table(df, aggfunc='size', index=["IND"], columns="DATA")

print(df)

DATA   2    3    4    10
IND                     
1     1.0  NaN  1.0  NaN
2     NaN  1.0  NaN  NaN
3     NaN  NaN  1.0  NaN
4     1.0  NaN  NaN  NaN
5     NaN  2.0  NaN  1.0

推荐阅读