首页 > 解决方案 > 保持到每组值最后一次出现

问题描述

这是我的熊猫数据框的简化示例:

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
4   UserA       0
5   UserA       1
6   UserA       0
7   UserA       0
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
14  UserB       1
15  UserB       0
16  UserC       0
17  UserC       0

对于每个用户,我想在第一次出现 Binary=1 之后删除所有行。注意,会有一些用户没有 Binary=1 的情况,例如本例中的 UserC。

输出如下所示:

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
16  UserC       0
17  UserC       0

标签: pythonpandas

解决方案


这是使用groupby自定义函数和进行转换的一种方法:

# check which Binary values are 1 and group the series by User
g = df.Binary.eq(1).groupby(df.User)
# transform to either idxmax or the last index depending
# on whether there are any Trues or not
m = g.transform(lambda x: x.idxmax() if x.any() else x.index[-1])
# index the dataframe where the index is smaler or eq m
out = df[df.index <= m]

print(out)

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
16  UserC       0
17  UserC       0

推荐阅读