python - 将分组的熊猫数据应用回原始数据框
问题描述
我正在使用下面的数据框:
这些是我试图按游戏分组的国际象棋游戏,然后根据该游戏中的移动数对每场游戏执行一个功能......
game_id move_number colour avg_centi
0 03gDhPWr 1 white NaN
1 03gDhPWr 2 black 37.0
2 03gDhPWr 3 white 61.0
3 03gDhPWr 4 black -5.0
4 03gDhPWr 5 white 26.0
5 03gDhPWr 6 black 31.0
6 03gDhPWr 7 white -2.0
... ... ... ... ...
110091 zzaiRa7s 34 black NaN
110092 zzaiRa7s 35 white NaN
110093 zzaiRa7s 36 black NaN
110094 zzaiRa7s 37 white NaN
110095 zzaiRa7s 38 black NaN
110096 zzaiRa7s 39 white NaN
110097 zzaiRa7s 40 black NaN
具体来说,我正在使用pd.cut
创建一个新列 ,game_phase
其中列出了给定的移动是在开局、中局还是残局中进行的。
我正在使用以下代码来实现这一点。请注意,每个游戏必须根据该游戏中的移动总数划分为opening
、middlegame
和箱。endgame
def define_move_phase(x):
bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())
phases = ["opening", "middlegame", "endgame"]
try:
x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
except ValueError:
x.loc[:, 'phase'] = None
print(x)
df.groupby('game_id').apply(define_move_phase)
该函数中的print
语句显示该函数正在处理各个组(见下文),但它不会将该phase
列应用回原始数据帧。
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
ETC...
我想将新phase
列应用回原始数据框或将分组的数据框再次取消组合为一个大数据框。这样做的最佳方法是什么?
解决方案
您的函数没有返回语句