python - Finding start-time and end-time of events in a day - Pandas timeseries - such that end time does not fall into next day
问题描述
I have a meteorological timeseries df:
df = pd.DataFrame({'date':['11/10/2017 0:00','11/10/2017 03:00','11/10/2017 06:00','11/10/2017 09:00','11/10/2017 12:00',
'11/11/2017 0:00','11/11/2017 03:00','11/11/2017 06:00','11/11/2017 09:00','11/11/2017 12:00',
'11/12/2017 00:00','11/12/2017 03:00','11/12/2017 06:00','11/12/2017 09:00','11/12/2017 12:00'],
'value':[850,np.nan,np.nan,np.nan,np.nan,500,650,780,np.nan,800,350,690,780,np.nan,np.nan]})
df['date'] = pd.to_datetime(df.date.astype(str), format='%m/%d/%Y %H:%M',errors ='coerce')
df.index = pd.DatetimeIndex(df.date)
With this dataframe, I am trying to find out start time and end time of event:
(df["value"] < 1000)
I used solution similar to How to find the start time and end time of an event in python? with revised code:
current_event = None
result = []
for event, time in zip((df["value"] < 1000), df.index):
if event != current_event:
if current_event is not None:
result.append([current_event, start_time, time - pd.DateOffset(hours = 1, minutes = 30)])
current_event, start_time = event, time - pd.DateOffset(hours = 1, minutes = 30)
df = pd.DataFrame(result, columns=['Event','StartTime','EndTime'])
df
Output is:
Event StartTime EndTime
0 True 2017-11-09 22:30:00 2017-11-10 01:30:00
1 False 2017-11-10 01:30:00 2017-11-10 22:30:00
2 True 2017-11-10 22:30:00 2017-11-11 07:30:00
3 False 2017-11-11 07:30:00 2017-11-11 10:30:00
4 True 2017-11-11 10:30:00 2017-11-12 07:30:00
Desired output differs from the output above:
EndTime in second row(Index 1) to be 2017-11-10 13:30:00
EndTime of fifth row (Index 4 ) to be 2017-11-11 13:30:00
New row sixth row(index 5) and 6th
Logic:
Since the timestamps are 3h apart ,an event is assumed to start 1 hr and 30 minutes before and end at 1 hr 30 minutes after the timestamp.
If two consecutive events are similar then they add up like: 1 hr and 30 minutes before the first timestamp till 1 hr and 30 minutes after second timestamp and so on.
StartTime of first event of the day i.e. at time 00:00 should always be 1 hr 30 minutes before 00:00 timestamp i.e. 22:30 of previous day.
EndTime of the last event of the day i.e. at time 12:00 should always be 1 hr 30 minutes after the 12:00 timestamp i.e. 13:30 of the same day.
Any prompt help on this issue would be highly appreciated. Tried to fix it desperately but no luck yet.
Thanks a lot!
解决方案
创建输出数据框:
out = pd.DataFrame({"Event": df["value"] < 1000,
"StartTime": df["date"] - pd.DateOffset(hours=1, minutes=30),
"EndTime": df["date"] + pd.DateOffset(hours=1, minutes=30)},
index=df.index)
>>> out
Event StartTime EndTime
0 True 2017-11-09 22:30:00 2017-11-10 01:30:00 # Group 0
1 False 2017-11-10 01:30:00 2017-11-10 04:30:00 # Group 1
2 False 2017-11-10 04:30:00 2017-11-10 07:30:00
3 False 2017-11-10 07:30:00 2017-11-10 10:30:00
4 False 2017-11-10 10:30:00 2017-11-10 13:30:00
5 True 2017-11-10 22:30:00 2017-11-11 01:30:00 # Group 2
6 True 2017-11-11 01:30:00 2017-11-11 04:30:00
7 True 2017-11-11 04:30:00 2017-11-11 07:30:00
8 False 2017-11-11 07:30:00 2017-11-11 10:30:00 # Group 3
9 True 2017-11-11 10:30:00 2017-11-11 13:30:00 # Group 4
10 True 2017-11-11 22:30:00 2017-11-12 01:30:00 # Group 5
11 True 2017-11-12 01:30:00 2017-11-12 04:30:00
12 True 2017-11-12 04:30:00 2017-11-12 07:30:00
13 False 2017-11-12 07:30:00 2017-11-12 10:30:00 # Group 6
14 False 2017-11-12 10:30:00 2017-11-12 13:30:00
定义一些帮助组:
event_group = out["Event"].ne(out["Event"].shift(fill_value=0)).cumsum()
time_group = (out["StartTime"]
- out["EndTime"].shift(fill_value=out["StartTime"].iloc[0])
!= pd.Timedelta(0)).cumsum()
>>> out[["Event"]].assign(EventGroup=event_group,
TimeGroup=time_group,
Groups=event_group + time_group)
Event EventGroup TimeGroup Groups
0 True 1 0 1 # Group 0
1 False 2 0 2 # Group 1
2 False 2 0 2
3 False 2 0 2
4 False 2 0 2
5 True 3 1 4 # Group 2
6 True 3 1 4
7 True 3 1 4
8 False 4 1 5 # Group 3
9 True 5 1 6 # Group 4
10 True 5 2 7 # Group 5
11 True 5 2 7
12 True 5 2 7
13 False 6 2 8 # Group 6
14 False 6 2 8
减少输出数据框:
out = pd.DataFrame(out.groupby(event_group + time_group)
.apply(lambda g: (g["Event"].iloc[0],
g["StartTime"].iloc[0],
g["EndTime"].iloc[-1]))
.tolist(), columns=["Event", "StartTime", "EndTime"])
>>> out
Event StartTime EndTime
0 True 2017-11-09 22:30:00 2017-11-10 01:30:00
1 False 2017-11-10 01:30:00 2017-11-10 13:30:00
2 True 2017-11-10 22:30:00 2017-11-11 07:30:00
3 False 2017-11-11 07:30:00 2017-11-11 10:30:00
4 True 2017-11-11 10:30:00 2017-11-11 13:30:00
5 True 2017-11-11 22:30:00 2017-11-12 07:30:00
6 False 2017-11-12 07:30:00 2017-11-12 13:30:00
推荐阅读
- python - 使用python将句子中的每个单词替换为单词索引
- node.js - 必须提供 JWT - Delete 方法将令牌返回为 null 而不是用户令牌
- android - 如何防止 AndroidStudio Android Logcat 过滤我的日志
- java - 将意大利面条代码转换为带有流的 Java 8
- xamarin - 标识符未找到 xamarin ios 文化
- python - conda 为什么要创建新的安装环境
- javascript - 如何在 d3 v5 forceSimulation 中复制向下的重力?
- javascript - 如何使用 IoNIC 从国际电话号码中获取电话本地号码
- azure - Azure 分析服务部署
- amazon-athena - 自定义聚合和计算字段的 Quicksight 问题