首页 > 解决方案 > 将分组的熊猫数据应用回原始数据框

问题描述

我正在使用下面的数据框:

这些是我试图按游戏分组的国际象棋游戏,然后根据该游戏中的移动数对每场游戏执行一个功能......

        game_id     move_number colour  avg_centi
0       03gDhPWr    1           white   NaN
1       03gDhPWr    2           black   37.0
2       03gDhPWr    3           white   61.0
3       03gDhPWr    4           black   -5.0
4       03gDhPWr    5           white   26.0
5       03gDhPWr    6           black   31.0
6       03gDhPWr    7           white   -2.0
... ... ... ... ...
110091  zzaiRa7s    34          black   NaN
110092  zzaiRa7s    35          white   NaN
110093  zzaiRa7s    36          black   NaN
110094  zzaiRa7s    37          white   NaN
110095  zzaiRa7s    38          black   NaN
110096  zzaiRa7s    39          white   NaN
110097  zzaiRa7s    40          black   NaN

具体来说,我正在使用pd.cut创建一个新列 ,game_phase其中列出了给定的移动是在开局、中局还是残局中进行的。

我正在使用以下代码来实现这一点。请注意,每个游戏必须根据该游戏中的移动总数划分为openingmiddlegame和箱。endgame

def define_move_phase(x):
    bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())    
    phases = ["opening", "middlegame", "endgame"]
    try:
        x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
    except ValueError:
        x.loc[:, 'phase'] = None
    print(x)

df.groupby('game_id').apply(define_move_phase)

该函数中的print语句显示该函数正在处理各个组(见下文),但它不会将该phase列应用回原始数据帧。

     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]
     game_id  move_number colour  avg_centi    phase
0   03gDhPWr            1  white        NaN  opening
1   03gDhPWr            2  black       37.0  opening
2   03gDhPWr            3  white       61.0  opening
3   03gDhPWr            4  black       -5.0  opening
4   03gDhPWr            5  white       26.0  opening
5   03gDhPWr            6  black       31.0  opening
6   03gDhPWr            7  white       -2.0  opening
..       ...          ...    ...        ...      ...
54  03gDhPWr           55  white       58.0  endgame
55  03gDhPWr           56  black       26.0  endgame
56  03gDhPWr           57  white      116.0  endgame
57  03gDhPWr           58  black     2000.0  endgame
58  03gDhPWr           59  white        0.0  endgame
59  03gDhPWr           60  black        0.0  endgame
60  03gDhPWr           61  white        NaN  endgame

[61 rows x 5 columns]

ETC...

我想将新phase列应用回原始数据框或将分组的数据框再次取消组合为一个大数据框。这样做的最佳方法是什么?

标签: pythonpandas

解决方案


您的函数没有返回语句


推荐阅读