首页 > 解决方案 > 根据 ID 和日期计算输入

问题描述

我有一个如下所示的数据集:

ID       Date      Input
1        1-Nov      A
1        1-Nov      B
1        2-Nov      A
1        3-Nov      A
1        3-Nov      B
1        4-Nov      B
2        1-Nov      B

我希望我的输出根据样式和日期计算输入的出现,因此输出将如下所示:

ID       Date      Input          Output
1        1-Nov      A              1          ---first occurrence of A
1        1-Nov      B              1             first occurrence of B
1        2-Nov      A              2             Count is two because A was there for Nov1 as well
1        3-Nov      A              3
1        3-Nov      B              1             B is reseted to 1 since it was not present for nov2
1        4-Nov      B              2
2        1-Nov      B              1             Again it is 1 because ID changed from 1 to 2

谁能帮助我如何根据 ID 和日期获取项目的输出计数。

标签: pythonpandasdataframemetadata

解决方案


我们用ID+组成组InputGroupBy并检查差值在哪里等于1 day。然后我们取这些的累积和booleans

df['Date'] = pd.to_datetime(df['Date'], format='%d-%b')

df['temp'] = (
    df.groupby(['ID', 'Input'])['Date'].diff()
    .fillna(pd.Timedelta(1, unit='day')).le(pd.Timedelta(1, unit='day'))
)

df['Output'] = df.groupby(['ID', 'Input'])['temp'].cumsum().astype(int)

df = df.drop(columns='temp').assign(Date=df['Date'].dt.strftime('%d-%b'))
   ID    Date Input  Output
0   1  01-Nov     A       1
1   1  01-Nov     B       1
2   1  02-Nov     A       2
3   1  03-Nov     A       3
4   1  03-Nov     B       1
5   1  04-Nov     B       2
6   2  01-Nov     B       1

推荐阅读