python - 更快的方法来获得列的平均值
问题描述
输入:
Home Away Home_goals Away_goals
------------------------------------
Team 1 Team 2 2 1
Team 3 Team 4 3 5
Team 2 Team 1 5 3
Team 4 Team 3 1 5
输出:
Home Away Home_goals Away_goals Mean
------------------------------------------------------
Team 1 Team 2 2 1 5.5 ((2+1+5+3)/2)
Team 3 Team 4 3 5 7 ((3+5+1+5)/2)
Team 2 Team 1 5 3 5.5 ((2+1+5+3)/2)
Team 4 Team 3 1 5 7 ((3+5+1+5)/2)
我需要从 H2H 比赛中获得总进球数的平均值。这段代码应该可以工作,但不幸的是,这需要很长时间才能完成。有没有更快的方法来做到这一点?
def fce(team):
team_1 = team.iloc[0]
team_2 = team.iloc[1]
new = df[(df["home"] == team_1) & (df["away"] == team_2) | (df["home"] == team_2) & (df["away"] == team_1)]
mean = (new["home_goals"] + new["away_goals"]).mean()
return mean
df["mean"] = df[["home", "away"]].apply(fce, axis=1)
谢谢
解决方案
您可以按行排序Home
和Away
列,创建DataFrame
,添加总和目标列并mean
用于GroupBy.transform
新列:
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
.assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby([0,1])['sum']
.transform('mean'))
print (df)
Home Away Home_goals Away_goals Mean
0 Team 1 Team 2 2 1 5.5
1 Team 3 Team 4 3 5 7.0
2 Team 2 Team 1 5 3 5.5
3 Team 4 Team 3 1 5 7.0
替代按数组分配新列:
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (df.assign(t1 = a[:, 0],
t2 = a[:, 1],
sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby(['t1','t2'])['sum']
.transform('mean'))
print (df)
Home Away Home_goals Away_goals Mean
0 Team 1 Team 2 2 1 5.5
1 Team 3 Team 4 3 5 7.0
2 Team 2 Team 1 5 3 5.5
3 Team 4 Team 3 1 5 7.0
推荐阅读
- python - 将 NaN 值替换为 1d Numpy 数组中先前的非 NaN 值
- c# - 如何在 FindAsync() 方法中使用 lambda 函数?
- android - 通过 Gmail 和 Facebook 登录时用户身份验证失败
- javascript - 为什么我在反应原生 Firebase 中成功异步调用后收到错误
- mysql - 在 MySQL 中执行数组文字的最简单方法
- python - 错误 - 在 CBC 模式下,数据必须填充到 16 字节边界
- c++ - 是否有复制存储在二维数组中的相邻像素值的算法?
- java - 使用spring boot data redis模板得到空指针错误
- reactjs - 我如何在旋转木马上制作动画/移动图像以做出反应。哪种格式最好?
- javascript - 当我尝试搜索时,我的 jekyll/github 网站出现错误