首页 > 解决方案 > 计算标准时,数据框分组列与数据框索引重叠

问题描述

作为说明,我有这些数据。尽管当我根据 Column days as groupby 执行 cumsum 函数时效果很好,因为我对 STD 函数执行与天数列表相同的状态。无意中它使索引号分组

工作代码:

df['Vol'] = df['Lot2'].groupby(df['days']).cumsum()

      Lot2    days    mid    Vol        VWAPx       std
0     550.0      1  75.35    550.0  75.350000       NaN
1     619.0      1  75.30   1169.0  75.323524  0.075410
2     2227.0     1  75.30   3396.0  75.308098  0.710670
3     1776.0     1  75.30   5172.0  75.305317       NaN
4     1000.0     1  75.35   6172.0  75.312557       NaN
5     6274.0     1  75.40  12446.0  75.356637  0.143375
6     5000.0     1  75.35  17446.0  75.354735  0.190802
7     420.0      1  75.35  17866.0  75.354623  0.225577
8     108.0      1  75.30  17974.0  75.354295  0.374943
9     132.0      1  75.35  18106.0  75.354264  0.122366

问题出在这个问题上(因为它返回上面的 std ):

df['std'] = df['VWAPx'].groupby(df['days']).std()

对于所有“天”组,std 的结果应该相同。

df['std'] = df['VWAPx'].groupby(df['days'], as_index=False).std()

返回 TypeError: as_index=False 仅对 DataFrame 有效

注意:天数列表由 1-30 之间的数字组成(例如不包括 3-4)

标签: pythonpandasdataframepandas-groupby

解决方案


使用transform

df['std'] = df['VWAPx'].groupby(df['days']).transform('std')
print(df)

     Lot2  days    mid      Vol      VWAPx       std
0   550.0     1  75.35    550.0  75.350000  0.022094
1   619.0     1  75.30   1169.0  75.323524  0.022094
2  2227.0     1  75.30   3396.0  75.308098  0.022094
3  1776.0     1  75.30   5172.0  75.305317  0.022094
4  1000.0     1  75.35   6172.0  75.312557  0.022094
5  6274.0     1  75.40  12446.0  75.356637  0.022094
6  5000.0     1  75.35  17446.0  75.354735  0.022094
7   420.0     1  75.35  17866.0  75.354623  0.022094
8   108.0     1  75.30  17974.0  75.354295  0.022094
9   132.0     1  75.35  18106.0  75.354264  0.022094

推荐阅读