首页 > 解决方案 > 如何通过 python pandas 从特定点反向计算天数?

问题描述

我正在尝试计算负差并计算每个 id 的“日期”列之间的天数。当列值为“Y”时,将为每个 id 重置计数。

但是,我希望计数从重置日期开始。

目前我正在这样做并获得以下输出:

df = pd.DataFrame({'reset':['N','Y','N','N','N','Y','N'],
                   'date':['2019-09-04','2020-12-06','2020-12-06','2019-09-07','2019-11-08','2021-05-21','2021-06-23'],
                   'id':[16860,16860,16860,16860,16860,16860,16860]
                   })


df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
df = df.sort_values(['id','date'])

#create extra grouping column based on reset day
df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.groupby('id')['group'].cumsum()

df = df.sort_values(by='date', ascending=False)
df['negative difference'] =  df.groupby(['id','group'])['date'].diff() / np.timedelta64(1, 'D')
df['negative difference'] = df.groupby(['id','group'])['negative difference'].cumsum().fillna(0)
df = df.sort_values(['id','date'])


print(df)

  reset       date     id  group  negative difference
0     N 2019-09-04  16860    0.0                -65.0
3     N 2019-09-07  16860    0.0                -62.0
4     N 2019-11-08  16860    0.0                  0.0
1     Y 2020-12-06  16860    1.0                  0.0
2     N 2020-12-06  16860    1.0                  0.0
5     Y 2021-05-21  16860    2.0                -33.0
6     N 2021-06-23  16860    2.0                  0.0

但是,我希望输出看起来像这样:

  reset       date     id  group  negative difference
0     N 2019-09-04  16860    0.0               -459.0
3     N 2019-09-07  16860    0.0               -456.0
4     N 2019-11-08  16860    0.0               -394.0
1     Y 2020-12-06  16860    1.0                  0.0
2     N 2020-12-06  16860    1.0               -166.0
5     Y 2021-05-21  16860    2.0                  0.0
6     N 2021-06-23  16860    2.0                  0.0

标签: pythonpandasdataframenumpytime-series

解决方案


你在正确的道路上,只需要得到相反的顺序得到组

df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.iloc[::-1,:].groupby('id')['group'].cumsum()
df['negative difference'] =  df.date.sub(df.groupby(['id','group'])['date'].transform('last')) / np.timedelta64(1, 'D')
df
  reset       date     id  group  negative difference
0     N 2019-09-04  16860      2               -459.0
3     N 2019-09-07  16860      2               -456.0
4     N 2019-11-08  16860      2               -394.0
1     Y 2020-12-06  16860      2                  0.0
2     N 2020-12-06  16860      1               -166.0
5     Y 2021-05-21  16860      1                  0.0
6     N 2021-06-23  16860      0                  0.0

推荐阅读