python - Unexpected result in pandas pivot_table
问题描述
I am trying to do a pivot_table on a pandas Dataframe. I am almost getting the expected result, but it seems to be multiplied by two. I could just divide by two and call it a day, however, I want to know whether I am doing something wrong.
Here goes the code:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={"IND":[1,2,3,4,5,1,5,5],"DATA":[2,3,4,2,10,4,3,3]})
df_pvt = pd.pivot_table(df, aggfunc=np.size, index=["IND"], columns="DATA")
df_pvt
is now:
DATA 2 3 4 10
IND
1 2.0 NaN 2.0 NaN
2 NaN 2.0 NaN NaN
3 NaN NaN 2.0 NaN
4 2.0 NaN NaN NaN
5 NaN 4.0 NaN 2.0
However, instead of the 2.0 is should be 1.0! What am I misunderstanding / doing wrong?
解决方案
Use the string 'size'
instead. This will trigger the Pandas interpretation of "size", i.e. the number of elements in a group. The NumPy interpretation of size is the product of the lengths of each dimension.
df = pd.pivot_table(df, aggfunc='size', index=["IND"], columns="DATA")
print(df)
DATA 2 3 4 10
IND
1 1.0 NaN 1.0 NaN
2 NaN 1.0 NaN NaN
3 NaN NaN 1.0 NaN
4 1.0 NaN NaN NaN
5 NaN 2.0 NaN 1.0
推荐阅读
- ansible - Ansible 任务用油门阻塞
- flutter - 如何打包 Flutter 库
- google-apps-script - 是否可以将下面我的 Apps 脚本中的 inputValue 转换为硬编码文本,以便注释仅显示文本?
- reactjs - RTL 中的 waitForElementToBeRemoved 错误超时
- python - Visual Studio 代码错误消息:从 bs4 导入 BeautifulSoup ModuleNotFoundError:没有名为“bs4”的模块
- c# - WebUtils.ApiCall 在 Post 之后不返回错误消息
- reactjs - 如何用玩笑调用异步模拟函数?
- couchbase - 尽管有索引,N1QL 连接缓慢
- html - VS Code:使用“Live Server”打开时未预览 SVG
- c++ - 抛出析构函数导致内存泄漏