首页 > 解决方案 > 有没有找到由当前单元格值定义的最后一行的最佳方法?

问题描述

我有一个熊猫数据框,其中包含有关体育赛事的数据。假设您在其中看到了一个winner_id、一个loser_id和一个match_id。我正在尝试查找具有相同 ID 的先前包含的获胜者的最后一个索引。预期的数据框请参见此处:

在此处输入图像描述

d = {'winner':["A","B","C","A","A","C","B","D"], 'loser':["B","C","D","D","D","B","A","C"], 'id':[1,2,3,4,5,6,7,8], 'id_of_last_winner:' ["", 0, 1, 0, 3, 2, 5, 4]}
df = pd.DataFrame(d)
df

如果我遍历列,它的表现非常糟糕。我的代码的期望应该是这样的: id_of_last_winner

等等...

所以第一个直觉是我用 for 循环遍历失败者列,然后将当前元素与失败者和获胜者列中的其他元素进行比较。这很简单,但执行起来很糟糕,因为每次迭代都包含另外 2 次迭代。有没有更好的方法来加快这个过程?

我满怀希望,因为我发现

df['id_of_last_winner'] = data.groupby('winner')['id'].shift()

但这仅检查失败者列。有更好的主意吗?提前致谢!

标签: pythonpandasdataframe

解决方案


您根据需要让它有点混乱,id但预期的输出使用index. 这是一个使用示例id

# create a list of players
players = list(set(df.winner).union(set(df.loser)) )

# create last game's id for each player
for player in players:
    df[player] = df.id.where((df.winner==player) | (df.loser==player) ).\
                        ffill().shift()

# here's our result
df['winner_last_game'] = df.apply(lambda r: r[r.winner], axis=1)

显然,如果您有大量玩家,它就不起作用,但它应该适用于数百名玩家。这是输出

+---+----+--------+-------+-------------------+-----+-----+-----+-----+------------------+
|   | id | winner | loser | id_of_last_winner |  A  |  C  |  D  |  B  | winner_last_game |
+---+----+--------+-------+-------------------+-----+-----+-----+-----+------------------+
| 0 |  1 | A      | B     |                   | NaN | NaN | NaN | NaN | NaN              |
| 1 |  2 | B      | C     |                 0 | 1.0 | NaN | NaN | 1.0 | 1.0              |
| 2 |  3 | C      | D     |                 1 | 1.0 | 2.0 | NaN | 2.0 | 2.0              |
| 3 |  4 | A      | D     |                 0 | 1.0 | 3.0 | 3.0 | 2.0 | 1.0              |
| 4 |  5 | A      | D     |                 3 | 4.0 | 3.0 | 4.0 | 2.0 | 4.0              |
| 5 |  6 | C      | B     |                 2 | 5.0 | 3.0 | 5.0 | 2.0 | 3.0              |
| 6 |  7 | B      | A     |                 5 | 5.0 | 6.0 | 5.0 | 6.0 | 6.0              |
| 7 |  8 | D      | C     |                 4 | 7.0 | 6.0 | 5.0 | 7.0 | 5.0              |
+---+----+--------+-------+-------------------+-----+-----+-----+-----+------------------+

推荐阅读