首页 > 解决方案 > Pandas Dataframe:在 12 小时内删除带有 DateTime 的行

问题描述

我有这个数据框:

对于每个 IMEI,我想检查是否DATETIME OF LVD在 12 小时内发生任何后续。如果他们这样做,他们需要被删除。

例如,在此 df 行中,需要删除 1、6、13、14、15 行。

               IMEI      DATETIME OF LVD  
0   864811031001402  2018-10-04 23:50:00         
1   864811031001402  2018-10-05 04:35:00         
2   864811031001402  2018-10-15 03:40:00       
3   864811031001402  2018-10-21 04:25:00        
4   866710038341548  2018-10-27 05:53:00       
5   864811031092336  2018-10-17 18:10:00         
6   864811031092336  2018-10-17 18:41:00       
7   864811031092336  2018-10-21 04:50:00          
8   864811031092336  2018-10-23 03:21:00         
9   864811031092336  2018-10-24 03:00:00        
10  864811031009041  2018-10-13 21:52:00       
11  864811031009041  2018-10-27 11:13:00       
12  864811031015584  2018-10-27 00:48:00        
13  864811031015584  2018-10-28 05:25:00        
14  864811031015584  2018-10-28 05:26:00        
15  864811031015584  2018-10-28 05:27:00   

我可以获得每条记录的增量时间差(如下),但是如何为每个 IMEI 组做呢?

df['Delta'] = pd.to_datetime(df['DATETIME OF LVD']).diff()

               IMEI      DATETIME OF LVD              Delta
0   864811031001402  2018-10-04 23:50:00                NaT
1   864811031001402  2018-10-05 04:35:00    0 days 04:45:00
2   864811031001402  2018-10-15 03:40:00    9 days 23:05:00
3   864811031001402  2018-10-21 04:25:00    6 days 00:45:00
4   866710038341548  2018-10-27 05:53:00    6 days 01:28:00
5   864811031092336  2018-10-17 18:10:00 -10 days +12:17:00
6   864811031092336  2018-10-17 18:41:00    0 days 00:31:00
7   864811031092336  2018-10-21 04:50:00    3 days 10:09:00
8   864811031092336  2018-10-23 03:21:00    1 days 22:31:00
9   864811031092336  2018-10-24 03:00:00    0 days 23:39:00
10  864811031009041  2018-10-13 21:52:00 -11 days +18:52:00
11  864811031009041  2018-10-27 11:13:00   13 days 13:21:00
12  864811031015584  2018-10-27 00:48:00  -1 days +13:35:00
13  864811031015584  2018-10-28 05:25:00    1 days 04:37:00
14  864811031015584  2018-10-28 05:26:00    0 days 00:01:00
15  864811031015584  2018-10-28 05:27:00    0 days 00:01:00

标签: pythonpandasdataframe

解决方案


通过 2 个布尔掩码使用DataFrameGroupBy.diff和过滤- 与按位boolean indexing检查缺失行的 Timedelta 链式比较:|OR

df['DATETIME OF LVD'] = pd.to_datetime(df['DATETIME OF LVD'])

s = df.groupby('IMEI')['DATETIME OF LVD'].diff()
df = df[(s > pd.Timedelta('12 hour')) | s.isna()]
print (df)
               IMEI     DATETIME OF LVD
0   864811031001402 2018-10-04 23:50:00
2   864811031001402 2018-10-15 03:40:00
3   864811031001402 2018-10-21 04:25:00
4   866710038341548 2018-10-27 05:53:00
5   864811031092336 2018-10-17 18:10:00
7   864811031092336 2018-10-21 04:50:00
8   864811031092336 2018-10-23 03:21:00
9   864811031092336 2018-10-24 03:00:00
10  864811031009041 2018-10-13 21:52:00
11  864811031009041 2018-10-27 11:13:00
12  864811031015584 2018-10-27 00:48:00
13  864811031015584 2018-10-28 05:25:00

详情

print (s)
0                 NaT
1     0 days 04:45:00
2     9 days 23:05:00
3     6 days 00:45:00
4                 NaT
5                 NaT
6     0 days 00:31:00
7     3 days 10:09:00
8     1 days 22:31:00
9     0 days 23:39:00
10                NaT
11   13 days 13:21:00
12                NaT
13    1 days 04:37:00
14    0 days 00:01:00
15    0 days 00:01:00
Name: DATETIME OF LVD, dtype: timedelta64[ns]

推荐阅读