首页 > 解决方案 > Cumsum Pandas groupby 两列

问题描述

我有这个数据框

H = Home win
D = Draw
A = Away win

            Datetime    HomeTeam            AwayTeam            HG  AG  FT
0   2021-02-17 22:00:00 Colo Colo           U. De Concepcion    1   0   H
1   2021-02-15 14:30:00 Cobresal            U. Espanola         4   1   H
2   2021-02-14 22:00:00 Deportes Iquique    S. Wanderers        2   0   H
3   2021-02-14 22:00:00 La Serena           A. Italiano         0   2   A
4   2021-02-14 22:00:00 O'Higgins           Colo Colo           1   1   D
... ... ... ... ... ... ...

我想将每排比赛的主场和客场胜利相加。

代码

#Creating Bool columns for cumsum
df['HomeWin'] = df['HG'] > df['AG']
df['Draw'] = df['HG'] == df['AG']
df['HomeLoss'] = df['HG'] < df['AG']

#Calculating previous wins of home team except current row
home_sum = df.groupby('HomeTeam')['HomeWin'].apply(lambda x: x.shift(fill_value=0).rolling(99,min_periods=1).sum())

#Calc previous matches of home team except current row
home_count = (df.groupby('HomeTeam')['Win'].apply(lambda x: x.shift(fill_value=0).rolling(99,min_periods=1).sum()) + df.groupby('HomeTeam')['Draw'].apply(lambda x: x.shift(fill_value=0).rolling(99,min_periods=1).sum()) + df.groupby('HomeTeam')['HomeLoss'].apply(lambda x: x.shift(fill_value=0).rolling(99,min_periods=1).sum()))

#Calculating previous wins of away team
away_sum = df.groupby('AwayTeam')['HomeLoss'].cumsum()

#Calc previous matches of away team 
away_count = df.groupby('AwayTeam')['HomeLoss'].cumsum() + df.groupby('AwayTeam')['Draw'].cumsum() + df.groupby('AwayTeam')['HomeWin'].cumsum()
print(away_count)


df['SUM'] = (home_sum + away_sum) / (home_count + away_count)

输出

                Datetime          HomeTeam          AwayTeam  HG  AG FT     1     X     2      SUM
0    2021-02-17 22:00:00         Colo Colo  U. De Concepcion   1   0  H  2.53  3.01  2.80  0.285714

Home_sum = 6
Home_count = 17
Away_sum = 4
Away_count = 18
df['SUM'] = (6 + 4) / (17 + 18)

EXPECTED OUTPUT

Home_sum = 6
Home_count = 17
Away_sum = 3
Away_count = 17

df['SUM'] = (6 + 3) / (17 + 17)

我有一个问题,它不计算同一支球队的比赛,而是同一行的球队。在示例中,错误是它考虑了 AwayTeam 列中的 U. De Concepcion 而不是 Colo colo 的值

标签: pythonpandasdataframe

解决方案


推荐阅读