首页 > 解决方案 > 计算自上次维护以来的日期差异的有效方法是什么?

问题描述

以下是我正在使用的示例数据集:

            maint id
datetime            
2015-01-01    1.0  a
2015-01-02    NaN  a
2015-01-03    NaN  a
2015-01-04    1.0  a
2015-01-05    NaN  a
2015-01-06    NaN  a
2015-01-07    NaN  a
2015-01-01    NaN  b
2015-01-02    NaN  b
2015-01-03    1.0  b
2015-01-04    1.0  b
2015-01-05    NaN  b
2015-01-06    NaN  b
2015-01-07    NaN  b

我想要得到的是日差,因为df['maint']是 1。

            maint id  days
datetime                  
2015-01-01    1.0  a     0
2015-01-02    NaN  a     1
2015-01-03    NaN  a     2
2015-01-04    1.0  a     0
2015-01-05    NaN  a     1
2015-01-06    NaN  a     2
2015-01-07    NaN  a     3
2015-01-01    NaN  b     0
2015-01-02    NaN  b     0
2015-01-03    1.0  b     0
2015-01-04    1.0  b     0
2015-01-05    NaN  b     1
2015-01-06    NaN  b     2
2015-01-07    NaN  b     3

因为我有几千个不同的ID,每个ID都有几年的维护记录。我想找到一种计算日差的有效方法。

标签: pythonpandas

解决方案


利用:

df['days'] = df.index.where(df['maint'].eq(1))
df['days'] = (df.index - df.groupby('id')['days'].ffill()).fillna(pd.Timedelta(0)).dt.days
print (df)
            maint id  days
datetime                  
2015-01-01    1.0  a     0
2015-01-02    NaN  a     1
2015-01-03    NaN  a     2
2015-01-04    1.0  a     0
2015-01-05    NaN  a     1
2015-01-06    NaN  a     2
2015-01-07    NaN  a     3
2015-01-01    NaN  b     0
2015-01-02    NaN  b     0
2015-01-03    1.0  b     0
2015-01-04    1.0  b     0
2015-01-05    NaN  b     1
2015-01-06    NaN  b     2
2015-01-07    NaN  b     3

说明

  1. 首先创建新列days,其值为df.indexwhere maintis 1,另一个值为NaT
  2. 减去index由创建的新系列GroupBy.ffill,将NaNs 替换为0 timedelta并最后将它们转换为天数Series.dt.days

推荐阅读