python - Pandas - 两列的条件累积和
问题描述
我想计算足球队的积分。我有每场比赛的积分,无论是主场还是客场,我都会得到积分。我不知道如何获得每支球队的总分(主场+客场积分)
这是我到目前为止所拥有的:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
lambda x: x.cumsum().shift())
print(df)
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 NaN NaN
2 Malmo Gothenburg 2018 0 3 1.0 1.0
3 Gothenburg Malmo 2018 1 1 1.0 1.0
4 Gothenburg Malmo 2018 0 3 2.0 2.0
5 Gothenburg Malmo 2018 1 1 2.0 5.0
6 Gothenburg Malmo 2018 0 3 3.0 6.0
7 Malmo Gothenburg 2018 0 3 1.0 4.0
8 Gothenburg Malmo 2018 1 1 3.0 9.0
9 Malmo Gothenburg 2018 0 3 1.0 7.0
10 Malmo Gothenburg 2018 1 1 1.0 10.0
11 Malmo Gothenburg 2018 0 3 2.0 11.0
这张表给了我每支球队的累积主客场积分,移动了 1 行。但我需要主客场比赛的总得分。H_cumsum 和 A_cumsum 应该添加主客场比赛的先前分数。
期望的输出:
row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...
最后一行 11 应该是:
H_cumsum (team Malmo) = 12 H_cumsum (team Gothenburg) = 15
解决方案
我找到了一个解决方案,使用堆栈,但这不是一个好的解决方案:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
['Home', 'Away', 'Year', 'Home', 'Away']]
d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()
print(df)
Points Team Year Total
Away Home Year Away Home Year Away Home Year Away Home Year
0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN
1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN
2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN
3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN
4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN
5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN
6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN
7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN
8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN
9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN
10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN
11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaN
Total/ Away 和 Total/ Home 下的分数是正确的。但是,由于所有额外的不必要的列,表格变得非常难以概览。(这个例子中没有显示的每一行还有 10 列,所以真的很乱。)
所需的输出是:
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 1.0 1.0
2 Malmo Gothenburg 2018 0 3 2.0 2.0
3 Gothenburg Malmo 2018 1 1 5.0 2.0
4 Gothenburg Malmo 2018 0 3 6.0 3.0
5 Gothenburg Malmo 2018 1 1 6.0 6.0
6 Gothenburg Malmo 2018 0 3 7.0 7.0
7 Malmo Gothenburg 2018 0 3 10.0 7.0
8 Gothenburg Malmo 2018 1 1 10.0 10.0
9 Malmo Gothenburg 2018 0 3 11.0 11.0
10 Malmo Gothenburg 2018 1 1 11.0 14.0
11 Malmo Gothenburg 2018 0 3 12.0 15.0
推荐阅读
- javascript - Angular 9 - 错误类型错误:无法读取未定义的属性“名称”
- python - 遍历字典的 values()
- python - 如何在 Redis 中从 hmset() 切换到 hset()?
- powershell - 如何在同一文件夹中使用 powershell 生成 Zip
- jquery - 画布中精灵左右无限循环
- c++ - 在 BAZEL 中,有没有办法防止依赖 C/C++ 标头传播到依赖库?
- vuetify.js - 如何在多个启用的 v-select 组件中预先选择选项?
- python - Python中的真假测验功能
- php - 调用未定义的方法 App\Mail\OrderPlaced::views()
- python - 随机值命令无法正常工作 - Python