首页 > 解决方案 > 更快的方法来获得列的平均值

问题描述

输入:

Home   Away   Home_goals Away_goals 
------------------------------------
Team 1 Team 2 2          1          
Team 3 Team 4 3          5
Team 2 Team 1 5          3
Team 4 Team 3 1          5

输出:

Home   Away   Home_goals Away_goals Mean
------------------------------------------------------
Team 1 Team 2 2          1          5.5 ((2+1+5+3)/2) 
Team 3 Team 4 3          5          7 ((3+5+1+5)/2)
Team 2 Team 1 5          3          5.5 ((2+1+5+3)/2) 
Team 4 Team 3 1          5          7 ((3+5+1+5)/2)

我需要从 H2H 比赛中获得总进球数的平均值。这段代码应该可以工作,但不幸的是,这需要很长时间才能完成。有没有更快的方法来做到这一点?

def fce(team):
    team_1 = team.iloc[0]
    team_2 = team.iloc[1]

    new = df[(df["home"] == team_1) & (df["away"] == team_2) | (df["home"] == team_2) & (df["away"] == team_1)]
    mean = (new["home_goals"] + new["away_goals"]).mean()
    return mean

df["mean"] = df[["home", "away"]].apply(fce, axis=1)

谢谢

标签: pythonpandasperformance

解决方案


您可以按行排序HomeAway列,创建DataFrame,添加总和目标列并mean用于GroupBy.transform新列:

a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
                .assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
                .groupby([0,1])['sum']
                .transform('mean'))
print (df)
     Home    Away  Home_goals  Away_goals  Mean
0  Team 1  Team 2           2           1   5.5
1  Team 3  Team 4           3           5   7.0
2  Team 2  Team 1           5           3   5.5
3  Team 4  Team 3           1           5   7.0

替代按数组分配新列:

a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (df.assign(t1 = a[:, 0],
                        t2 = a[:, 1],
                        sum = df[['Home_goals','Away_goals']].sum(axis=1))
                .groupby(['t1','t2'])['sum']
                .transform('mean'))
print (df)
     Home    Away  Home_goals  Away_goals  Mean
0  Team 1  Team 2           2           1   5.5
1  Team 3  Team 4           3           5   7.0
2  Team 2  Team 1           5           3   5.5
3  Team 4  Team 3           1           5   7.0

推荐阅读