首页 > 解决方案 > 如何使用唯一值和条件在数据框中计数?

问题描述

拿这个df

df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2],
                   'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'],
                   'operation':['buy','sell','sell','buy','transfer','buy','fee','buy','buy']})
   client_id    key operation
0          0  0_382       buy
1          0  0_382      sell
2          0  0_356      sell
3          1  1_365       buy
4          1    NaN  transfer
5          1  1_365       buy
6          2    NaN       fee
7          2  2_284       buy
8          2  2_405       buy

我需要创建一个名为的列,该列pos_id将为每一行提供一个增量值(1,2,3...),用于和的唯一值,client_idkey使用条件跳过transferfeeoperation

结果应该是这样的:

   client_id    key operation pos_id
0          0  0_382       buy      1
1          0  0_382      sell      1
2          0  0_356      sell      2
3          1  1_365       buy      1
4          1    NaN  transfer    NaN
5          1  1_365       buy      1
6          2    NaN       fee    NaN
7          2  2_284       buy      1
8          2  2_405       buy      2

标签: pythonpandasdataframe

解决方案


这里有两种方法。

第一种方法分组['client_id', 'key']到相同'pos_id'的范围内'client_id',无论它们是否连续出现。

用于where屏蔽要忽略的行,然后groupby+ ngroupwithsort=False将计算唯一组合。然后减去每组中的最小值,得到从 1 开始的计数器。

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .groupby(['client_id', 'key'], sort=False).ngroup()
       .replace(-1, np.NaN))  # ngroup makes NaN group keys -1.

df['pos_id'] = s - s.groupby(df['client_id']).transform('min') + 1

这种方法至少需要对输入进行排序'client_id',然后如果它们连续进入相同的,则只会将相同的键分组'pos_id'。删除要忽略的行,然后检查每行中的差异并取cumsuminside'client_id'

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .dropna(how='all'))

s = s['key'].ne(s['key'].shift()) | s['client_id'].ne(s['client_id'].shift())
df['pos_id'] = s.groupby(df['client_id']).cumsum()

对于您的输入,要么导致:

   client_id    key operation  pos_id
0          0  0_382       buy     1.0
1          0  0_382      sell     1.0
2          0  0_356      sell     2.0
3          1  1_365       buy     1.0
4          1    NaN  transfer     NaN
5          1  1_365       buy     1.0
6          2    NaN       fee     NaN
7          2  2_284       buy     1.0
8          2  2_405       buy     2.0

推荐阅读