首页 > 解决方案 > Pandas 计算满足条件时列之间的时间差

问题描述

我有一个 pandas df,它有 2 列Day(日期时间格式的日期)和Number of faults

我想创建两个新列,第一个是上一个故障事件的天数(我已经完成了),第二个是下一个故障事件的天数(我很难做到)。

Day            Number of breakdowns    Days from Previous Breakdown Event
2017-01-09                   0.0                                   0                                             
2017-01-12                   0.0                                   0
2017-01-13                   0.0                                   0
2017-01-14                   0.0                                   0
2017-01-16                   1.0                                   0
2017-01-17                   0.0                                   1
2017-01-18                   0.0                                   2
2017-01-19                   1.0                                   0
2017-01-20                   0.0                                   1
2017-01-21                   0.0                                   2
2017-01-23                   1.0                                   0

上一次故障事件的天数计算自发生故障以来经过的天数。

代码:

s = df.groupby(df['Number of breakdowns'].ne(0).cumsum())['Day'].transform('first')
df['Days from Previous Breakdown Event'] = (df['Day'] - s).dt.days
zeros_index = df['Number of breakdowns'].ne(0).idxmax()
df.loc[:zeros_index,'Days from Previous Breakdown Event'] = 0

我需要帮助来完成从下一个细分列开始的天数,我希望它看起来像这样:

Day            Number of breakdowns    Days from Next Breakdown Event
2017-01-09                   0.0                                   7                                             
2017-01-12                   0.0                                   4
2017-01-13                   0.0                                   3
2017-01-14                   0.0                                   2
2017-01-16                   1.0                                   0
2017-01-17                   0.0                                   2
2017-01-18                   0.0                                   1
2017-01-19                   1.0                                   0
2017-01-20                   0.0                                   3
2017-01-21                   0.0                                   2
2017-01-23                   1.0                                   0

标签: pythonpandasdatetimedifference

解决方案


iloc[::-1]transform和交换顺序last,然后交换s - df['Day']

s = df.groupby(df['Number of breakdowns'].iloc[::-1].ne(0).cumsum())['Day'].transform('last')
df['Days from Next Breakdown Event'] = (s - df['Day']).dt.days
print (df)
          Day  Number of breakdowns  Days from Previous Breakdown Event  \
0  2017-01-09                   0.0                                   0   
1  2017-01-12                   0.0                                   0   
2  2017-01-13                   0.0                                   0   
3  2017-01-14                   0.0                                   0   
4  2017-01-16                   1.0                                   0   
5  2017-01-17                   0.0                                   1   
6  2017-01-18                   0.0                                   2   
7  2017-01-19                   1.0                                   0   
8  2017-01-20                   0.0                                   1   
9  2017-01-21                   0.0                                   2   
10 2017-01-23                   1.0                                   0   

    Days from Next Breakdown Event  
0                                7  
1                                4  
2                                3  
3                                2  
4                                0  
5                                2  
6                                1  
7                                0  
8                                3  
9                                2  
10                               0  

详情

print (s)
0    2017-01-16
1    2017-01-16
2    2017-01-16
3    2017-01-16
4    2017-01-16
5    2017-01-19
6    2017-01-19
7    2017-01-19
8    2017-01-23
9    2017-01-23
10   2017-01-23
Name: Day, dtype: datetime64[ns]

推荐阅读