首页 > 解决方案 > Pandas - 使用另一列的 N 行降序获取一列的平均值

问题描述

我有这个数据框:

             team        opponent home_dummy     round     points
0     Athlético-PR       Flamengo          0        13      22.91
1     Athlético-PR    Atlético-GO          0        17       23.6
2     Athlético-PR      Fortaleza          1        20      28.58
3     Athlético-PR      Fortaleza          0         1      75.71
4     Athlético-PR          Ceará          1        14      42.22
5     Athlético-PR       Coritiba          1        10      52.91
6     Athlético-PR          Goiás          1         2      39.82
7     Athlético-PR          Goiás          0        21      65.13
8     Athlético-PR  Internacional          0        15      43.09
9     Athlético-PR         Grêmio          1        18      15.38
10    Athlético-PR          Sport          0        19      13.09
11    Athlético-PR         Santos          1        22      65.45
12    Athlético-PR         Santos          0         3      28.04
13    Athlético-PR      Palmeiras          1         4      -7.31
14    Athlético-PR      Palmeiras          0        23      11.02
15    Athlético-PR          Vasco          0         8      15.93
16    Athlético-PR     Fluminense          1         5       9.16
17    Athlético-PR          Bahia          1        12      59.78
18    Athlético-PR    Corinthians          1        16      18.22
19    Athlético-PR       Botafogo          1         9      29.35
20    Athlético-PR     Bragantino          1         7      20.07
.......

除了“Athlético-PR”之外,上面的数据框还有另外 19 个团队。


我如何为每个团队分组这个数据框,获取:

  1. 最后 N 轮的平均值,比如 N=6,这将得到 round 的平均值23, 22, 21, 20, 19, 18
  2. 通过 'home_dummy' 作为条件的最后 N 轮的平均值,这将获得 rounds23, 21, 19, 17, 15, 13或 rounds的平均值22, 20, 18, 16, 14, 12

结束于:

    team           mean_total   mean_home_0   mean_home_1
 0  Athlético-PR       mean x        mean y        mean z
  ...
   

标签: pandas

解决方案


我认为你可以做两个单独的 groupby:

df = df.sort_values(['team','round'])

out = (df.groupby(['team','home_dummy']).tail(6)
         .groupby(['team','home_dummy'])['points'].mean()
         .unstack('home_dummy')
         .add_prefix('mean_home_')
      )

out['mean_total'] = df.groupby('team').tail(6).groupby('team')['points'].mean()

输出:

home_dummy    mean_home_0  mean_home_1  mean_total
team                                              
Athlético-PR    29.806667    38.271667   33.108333

另一种选择是编写一个 udf 以便将两个 groupby 减少为一个:

def last6mean(x):
    return x.tail(6).mean()

out = (df.groupby(['team','home_dummy'])['points']
        .apply(last6mean)
        .unstack('home_dummy')
        .add_prefix('mean_home_')
     )

out['mean_total'] = df.groupby('team')['points'].apply(last6mean)

推荐阅读