python - Pandas 中所有先前行的条件运行计数

问题描述

假设我有以下数据框：

df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 
                    'Date': ['2019-01-01', '2019-02-01', '2019-03-01', '2019-03-01', '2019-02-15', 
                             '2019-03-15', '2019-04-05', '2019-04-05', '2019-04-15', '2019-06-10'],
                    'Sale':[100,200,150,200,150,100,300,250,500,400]})
df['Date'] = pd.to_datetime(df['Date'])
df

Event         Date
    A   2019-01-01
    B   2019-02-01
    A   2019-03-01
    A   2019-03-01
    B   2019-02-15
    C   2019-03-15
    B   2019-04-05
    B   2019-04-05
    A   2019-04-15
    C   2019-06-10

我想获得以下结果：

Event         Date  Previous_Event_Count
    A   2019-01-01                     0
    B   2019-02-01                     0
    A   2019-03-01                     1
    A   2019-03-01                     1
    B   2019-02-15                     1
    C   2019-03-15                     0
    B   2019-04-05                     2
    B   2019-04-05                     2
    A   2019-04-15                     3
    C   2019-06-10                     1

其中是事件 ( ) 在其相邻日期 ( ) 之前发生df['Previous_Event_Count']时的事件数（行）。例如，df['Event']df['Date']

2019-01-01 之前发生的事件 A 的数量为 0，
事件 A 在 2019-03-01 之前发生的次数为 1，并且
在 2019-04-15 之前发生的事件 A 的数量为 3。

我能够使用此行获得所需的结果：

df['Previous_Event_Count'] = [df.loc[(df.loc[i, 'Event'] == df['Event']) & (df.loc[i, 'Date'] > df['Date']), 
                                     'Date'].count() for i in range(len(df))]

虽然，它很慢，但它工作正常。我相信有更好的方法来做到这一点。我试过这条线：

df['Previous_Event_Count'] = df.query('Date < Date').groupby(['Event', 'Date']).cumcount()

但它会产生 NaN。

标签： pythonpandascountpandas-groupby

`groupby`+`rank`

日期可以被视为数字。用于'min'获取您的计数逻辑。

df['PEC'] = (df.groupby('Event').Date.rank(method='min')-1).astype(int)

  Event       Date  PEC
0     A 2019-01-01    0
1     B 2019-02-01    0
2     A 2019-03-01    1
3     A 2019-03-01    1
4     B 2019-02-15    1
5     C 2019-03-15    0
6     B 2019-04-05    2
7     B 2019-04-05    2
8     A 2019-04-15    3
9     C 2019-06-10    1

python - Pandas 中所有先前行的条件运行计数

问题描述

解决方案

`groupby`+`rank`

推荐阅读

python - Pandas 中所有先前行的条件运行计数

问题描述

解决方案

groupby+rank

推荐阅读

`groupby`+`rank`