首页 > 解决方案 > 从 pandas groupby 在每个组中执行操作(滚动平均值/添加新列)的有效方法

问题描述

我有一个像下面这样的数据框,有 2 个不同的SecuCode. 我需要按 column 对它们进行分组,从andSecuCode创建一个新列,然后进行滚动平均。VolumeAShare

    SecuCode    TradingDay  Volume  AShare
0   600455.SH   2013-01-04  1484606 49717768
1   600455.SH   2013-01-07  1315166 49717768
2   600455.SH   2013-01-08  1675933 49717768
3   600455.SH   2013-01-09  1244098 49717768
4   600455.SH   2013-01-10  751279  49717768
5   600551.SH   2018-12-24  1166098 505825296
6   600551.SH   2018-12-25  3285799 505825296
7   600551.SH   2018-12-26  3534143 505825296
8   600551.SH   2018-12-27  2462501 505825296
9   600551.SH   2018-12-28  2282954 505825296

对于单个数据帧,代码将如下所示:

df['volumn_percentage'] = df['Volume']/df['AShare']
df['turnover'] = df['volumn_percentage'].rolling(2).mean()

我需要在每一帧中执行这些多步操作

df_grouped = df.groupby(by='SecuCode')
state, frame = next(iter(df_grouped))

我想知道如何在这个数据帧的每一组中做同样的事情,然后恢复到原始数据帧格式?会有 ~1000 个 unique SecuCode,并且会有 ~1000 个交易日,因此循环遍历所有帧/组似乎非常慢。除了上述 2 之外,我可能需要执行更多操作(1 列除以另一列,滚动平均值),因此非常感谢更通用的方法。

标签: pythonpandasdataframepandas-groupby

解决方案


import pandas as pd

data = {'SecuCode': ['600455.SH', '600455.SH', '600455.SH', '600455.SH', '600455.SH', '600551.SH', '600551.SH', '600551.SH', '600551.SH', '600551.SH'],
        'TradingDay': ['2013-01-04', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10', '2018-12-24', '2018-12-25', '2018-12-26', '2018-12-27', '2018-12-28'],
        'Volume': [1484606, 1315166, 1675933, 1244098, 751279, 1166098, 3285799, 3534143, 2462501, 2282954],
        'AShare': [49717768, 49717768, 49717768, 49717768, 49717768, 505825296, 505825296, 505825296, 505825296, 505825296]}

df = pd.DataFrame(data)

# function with calculations
def calcs(df: pd.DataFrame) -> pd.DataFrame:
    df['volumn_percentage'] = df['Volume']/df['AShare']
    df['turnover'] = df['volumn_percentage'].rolling(2).mean()
    return df


# groupby and apply the function with the calculations
df_new = df.groupby('SecuCode').apply(calcs)

# print(df_new)
    SecuCode  TradingDay   Volume     AShare  volumn_percentage  turnover
0  600455.SH  2013-01-04  1484606   49717768           0.029861       NaN
1  600455.SH  2013-01-07  1315166   49717768           0.026453  0.028157
2  600455.SH  2013-01-08  1675933   49717768           0.033709  0.030081
3  600455.SH  2013-01-09  1244098   49717768           0.025023  0.029366
4  600455.SH  2013-01-10   751279   49717768           0.015111  0.020067
5  600551.SH  2018-12-24  1166098  505825296           0.002305       NaN
6  600551.SH  2018-12-25  3285799  505825296           0.006496  0.004401
7  600551.SH  2018-12-26  3534143  505825296           0.006987  0.006741
8  600551.SH  2018-12-27  2462501  505825296           0.004868  0.005928
9  600551.SH  2018-12-28  2282954  505825296           0.004513  0.004691

推荐阅读