python - 计算由两列过滤的平均数据框
问题描述
我有这个数据框
Unnamed: 0 Datetime HomeTeam AwayTeam Ball PossessionMatch_H Ball PossessionMatch_A
0 0 2021-05-24 02:30:00 U. De Chile Everton 68 32
1 1 2021-05-23 21:00:00 Huachipato Colo Colo 48 52
2 2 2021-05-23 18:30:00 Melipilla Antofagasta 47 53
3 3 2021-05-23 02:30:00 U. Espanola U. Catolica 37 63
4 4 2021-05-23 00:00:00 S. Wanderers O'Higgins 29 71
... ... ... ... ... ... ...
57 57 2021-03-28 15:45:00 Palestino Antofagasta 58 42
58 58 2021-03-28 01:00:00 U. Espanola S. Wanderers 50 50
59 59 2021-03-27 22:30:00 Colo Colo Union La Calera 58 42
60 60 2021-03-27 20:00:00 Everton O'Higgins 54 46
61 61 2021-03-27 15:00:00 Curico Unido Melipilla 41 59
我想将其拆分为多个数据帧并在“HomeTeam”和“AwayTeam”中应用两个标准,然后计算 Ball Possession 的平均值并将其放入新列“Ball PossessionMatch_H/MP”中(如果球队在“HomeTeam”中)和“Ball PossessionMatch_A/MP”,如果球队在“AwayTeam”
代码:
teams = pd.unique(df[['HomeTeam', 'AwayTeam']].values.ravel('K'))
df_list3 = []
for team in teams:
if team in df['HomeTeam']:
current_df = df[(df.HomeTeam == team) | (df.AwayTeam == team)].copy()
current_df["Ball PossessionMatch_H/MP"] = current_df.apply(
lambda x: (
current_df[(current_df.HomeTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_H"].sum()
+ current_df[(current_df.AwayTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_A"].sum()
)
/ (
current_df[(current_df.HomeTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_H"].size
+ current_df[(current_df.AwayTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_A"].size
),
axis=1,
)
else:
current_df = df[(df.HomeTeam == team) | (df.AwayTeam == team)].copy()
current_df["Ball PossessionMatch_A/MP"] = current_df.apply(
lambda x: (
current_df[(current_df.HomeTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_H"].sum()
+ current_df[(current_df.AwayTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_A"].sum()
)
/ (
current_df[(current_df.HomeTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_H"].size
+ current_df[(current_df.AwayTeam == team)].loc[x.name + 1 :, "Ball PossessionMatch_A"].size
),
axis=1,
)
df_list3.append(current_df)
df = pd.concat(df_list3)
print(df)
它不考虑if
并仅创建具有计算值的“Ball PossessionMatch_A/MP”列并复制行,我会将append
结果复制到原始数据框。
预期结果:
Unnamed: 0 Datetime HomeTeam AwayTeam Ball PossessionMatch_H Ball PossessionMatch_A Ball PossessionMatch_H/MP Ball PossessionMatch_A/MP
0 0 2021-05-24 02:30:00 U. De Chile Everton 68 32 50.33 43.5
解决方案
下面的呢?
hometeam_count = df.groupby("HomeTeam")["Ball PossessionMatch_H"].count()
hometeam_sum = df.groupby("HomeTeam")["Ball PossessionMatch_H"].sum()
awayteam_count = df.groupby("AwayTeam")["Ball PossessionMatch_A"].count()
awayteam_sum = df.groupby("AwayTeam")["Ball PossessionMatch_A"].sum()
df["Ball PossessionMatch_H/MP"] = df["HomeTeam"].apply(lambda x: ((hometeam_sum.loc[x] if x in hometeam_sum.index else 0) + (awayteam_sum.loc[x] if x in awayteam_sum.index else 0)) / ((hometeam_count.loc[x] if x in hometeam_count.index else 0) + (awayteam_count.loc[x] if x in awayteam_count.index else 0)))
df["Ball PossessionMatch_A/MP"] = df["AwayTeam"].apply(lambda x: ((hometeam_sum.loc[x] if x in hometeam_sum.index else 0) + (awayteam_sum.loc[x] if x in awayteam_sum.index else 0)) / ((hometeam_count.loc[x] if x in hometeam_count.index else 0) + (awayteam_count.loc[x] if x in awayteam_count.index else 0)))
上面的代码利用该groupby
函数来计算每个团队的 和sum
以及单独的。
代码的第二部分基本上是分别查找 ( ) 对应的和值,并按预期执行平均计算。count
Ball PossessionMatch_A
Ball PossessionMatch_H
loc
sum
count
HomeTeam
AwayTeam
正如你所知道的,两者的代码HomeTeam
几乎AwayTeam
相同,也被清理了,这样我们就不会重复你自己了。我这样做纯粹是为了便于阅读。
尽管这并不能解决问题(至于为什么没有在您创建的列中填写值),但我提供了该问题的替代解决方案,我认为该解决方案更直观且更具可读性。
推荐阅读
- javascript - 如何替换结构 this.props.{myName} 中的最后一部分
- python - 如何检查pyodbc游标是否为空?
- python - 为在 mixins 中返回 self 的方法注释返回类型
- javascript - Apps Script WebApp 没有一致地处理参数
- c++ - 用于随机数据访问的最有效文件类型
- c++ - 为什么 `make_ 的标准首选圆括号初始化
`? - python - 加速 Python 中的列表处理
- python - Django Createsuperuser 如何知道如何解析我的日期字段
- sharepoint - 使用 SharePoint 的 REST API 扩展文件夹的人员字段
- javascript - 如何在不使用 XMLHttpRequest 的情况下跟踪 404 错误