python - 在 Python 的 for 循环中追加多个时间序列数据
问题描述
我想遍历td1
下面的时间序列数据框以查找某些活动的起点和终点。
活动定义:
Type
列由medium
和low
值组成(您可以将medium
和low
视为两个独立的时间序列)。同样X
,如果a
轮到1
任何一个值Type
(例如, for X ==18
,a
变为1
whileType==medium
或a
变为1
while Type==low
),它标志着活动的开始,我想在这个时间戳记下Id
and分别为and ;Timestamp
Start_Id
StartTime
一旦活动开始,它就处于进行中状态。当活动正在进行时,如果两个a
值都轮流0
(即和),它标志着活动的结束(例如, for ,变为while并变为while ,遵循时间序列)。我想在这个时候分别删除和。 Type
medium
low
X ==18
a
0
Type==medium
a
0
Type==low
Id
Timestamp
End_Id
EndTime
最后,在以下情况下收集b
每个活动期间的所有值:
Type == medium
; 和a ==1
进入一个名为list_container
.
td1:
Timestamp X Y a b Type Id
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium 18
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low 18 Start
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium 18
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium 18
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium 18 ---- collect `b` value in `list_container` 1
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium 18 ---- collect `b` value in `list_container` 1
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium 18 ---- collect `b` value in `list_container` 1
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low 18
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium 18 ---- collect `b` value in `list_container` 1
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium 18
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low 18 End
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium 18
14 2000-10-26 17:29:34.770 19 14 0.0 62.0 medium 19
15 2000-10-26 17:29:40.770 19 14 1.0 61.0 medium 19 Start
16 2000-10-26 17:29:46.770 19 14 1.0 60.0 medium 19 ---- collect `b` value in `list_container` 2
17 2000-10-26 17:32:01.180 19 13 1.0 25.0 low 19
18 2000-10-26 17:32:01.180 19 14 0.0 51.5 low 19
19 2000-10-26 17:32:35.180 19 13 0.0 50.0 medium 19 End
可重现的例子:
td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'low',
19: 'medium'},
'Id': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19}})
td1
预期输出:
Start_Id StartTime End_Id EndTime list_container
18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:18.770 [23.5, 23.5, 26.5, 53.5, 62.0, 64.5]
19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0]
通过分析每次迭代前后可能的状态组合,我尝试了以下 for 循环:
def combined_func(td1):
td1['Timestamp'] = pd.to_datetime(td1['Timestamp'])
td1 = td1.sort_values(by=['Id','Timestamp'])
td1 = td1.reset_index(drop=True)
low_on = 0 # Flag to indicate status of low
medium_on = 0 # Flag to indicate status of medium
my_list = []
container_list = []
data = []
time_start = None
start_Id = None
time_end = None
end_Id = None
for i in range(1, len(td1.index)-1):
if (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']):
if ((not low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b13 = td1.loc[i, 'b']
my_list.append(b13)
medium_on = 1
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}")
elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}')
low_on = 1
else:
continue
elif ((not low_on) & (medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b5 = td1.loc[i, 'b']
my_list.append(b5)
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
low_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b7 = td1.loc[i, 'b']
my_list.append(b7)
list_container = my_list
my_list = []
medium_on = 0
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}")
else:
continue
elif ((low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b11 = td1.loc[i, 'b']
my_list.append(b11)
medium_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
low_on = 0
print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}")
else:
continue
elif ((low_on) & (medium_on)):
if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'):
b1 = td1.loc[i, 'b']
my_list.append(b1)
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
low_on = 0
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b3 = td1.loc[i, 'b']
my_list.append(b3)
list_container = my_list
my_list = []
medium_on = 0
else:
continue
data.append([start_Id, time_start, end_Id, time_end, list_container])
else:
continue
else:
continue
data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime", "End_Id", "EndTime", "list_container"])
return data_table1
output = combined_func(td1)
output
它返回:
This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000
This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is []
This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000
Start_Id StartTime End_Id EndTime list_container
0 18 2000-10-26 10:39:24.310 None None [23.5, 23.5, 23.5]
1 18 2000-10-26 10:39:24.310 None None [26.5, 53.5, 62.0, 64.5, 64.5]
不知何故End_Id
,EndTime
缺少并且 list_container 值也关闭了。我不确定哪些步骤出了问题。非常感谢任何建议。
解决方案
我找不到比X
根据您的描述分组并为每个返回值创建特定逻辑更好的方法了。
def times(df):
start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0]
end_time = pd.NaT
if(df.loc[df.a == 0, 'Type'].nunique() == 2):
end_time = (
df.loc[df.a == 0, ['Timestamp', 'Type']]
.drop_duplicates('Type', keep='last')
.Timestamp
.iloc[-1]
)
if (pd.notnull([start_time, end_time]).all()):
temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)]
start_id, end_id = temp.Id.iloc[[0, -1]].to_list()
list_container = temp[temp.a == 1].b.to_list()
return pd.Series({
'Start_Id': start_id,
'StartTime': start_time,
'End_Id': end_id,
'EndTime': end_time,
'list_container': list_container
})
results = td1.groupby('X').apply(times)
results
# Start_Id StartTime End_Id EndTime list_container
# X
# 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
# 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0]
推荐阅读
- html - 如何更改引导程序 4 中导航栏的引导程序汉堡图标?
- xbrl - 如何确定 XBRL 实例文件中定义的上下文的含义?
- sqlite - 删除 Sqlite 中数据已记录为字符串的日期范围内的数据
- java - JPA 单个查询以获取嵌套集合
- caching - Kubernetes 中的 Redis - Sidecar 还是客户端-服务器模型?
- c# - 如何从下拉列表中获取价值?
- laravel - Homestead - 通过 http://localhost:PORT 访问
- javascript - 为什么在 Node.js 中使用 readFileAsync 读取带有 json 的文件?
- ceph - CEPH 的 OSD_OBJECTSTORE 哪个输出正确
- c# - 调试时直接跳转到Method