python - 查找具有特定值的相邻行并获取时间差 python pandas
问题描述
此数据框包含两列。一种用于时间,一种用于信号。我想要做的是查找行,如果超过 3 个相邻行的信号等于 2,则获取该组的时间消耗。最后,我想获得这些组的数量和总时间消耗来计算每组的平均值。
例子:
所以标记的右刻度有超过 3 个相邻行,包含信号 2。如果相邻信号 2 的数量小于或等于 3,我们不认为它是正确的。
例如:第一组时间消耗是该组的最后一个条目 - 第一个条目。所以应该是 (2021-08-26 07:34:23)-(2021-08-26 07:33:41)=42 秒。
因此,明智的做法最终需要了解这些组的数量及其总时间消耗。
数据:
{'Time': {1: '2021-08-26 07:30:23',
2: '2021-08-26 07:30:32',
4: '2021-08-26 07:30:37',
9: '2021-08-26 07:30:44',
10: '2021-08-26 07:30:55',
11: '2021-08-26 07:31:05',
12: '2021-08-26 07:31:15',
13: '2021-08-26 07:31:26',
14: '2021-08-26 07:31:36',
15: '2021-08-26 07:31:47',
16: '2021-08-26 07:31:57',
17: '2021-08-26 07:32:07',
18: '2021-08-26 07:32:18',
19: '2021-08-26 07:32:28',
20: '2021-08-26 07:32:38',
21: '2021-08-26 07:32:49',
22: '2021-08-26 07:32:59',
23: '2021-08-26 07:33:09',
24: '2021-08-26 07:33:20',
26: '2021-08-26 07:33:27',
28: '2021-08-26 07:33:31',
31: '2021-08-26 07:33:41',
33: '2021-08-26 07:33:47',
38: '2021-08-26 07:34:20',
40: '2021-08-26 07:34:22',
42: '2021-08-26 07:34:23',
45: '2021-08-26 07:34:25',
51: '2021-08-26 07:35:23',
54: '2021-08-26 07:35:33',
56: '2021-08-26 07:35:34',
60: '2021-08-26 07:35:57',
62: '2021-08-26 07:35:59',
64: '2021-08-26 07:36:00',
66: '2021-08-26 07:36:01',
68: '2021-08-26 07:36:02',
74: '2021-08-26 07:37:02',
76: '2021-08-26 07:37:06',
78: '2021-08-26 07:37:07',
80: '2021-08-26 07:37:09',
84: '2021-08-26 07:37:39',
86: '2021-08-26 07:37:43',
89: '2021-08-26 07:37:55',
91: '2021-08-26 07:37:57',
92: '2021-08-26 07:37:58',
94: '2021-08-26 07:37:59',
95: '2021-08-26 07:38:09',
96: '2021-08-26 07:38:20',
97: '2021-08-26 07:38:30',
98: '2021-08-26 07:38:40',
101: '2021-08-26 07:38:53',
103: '2021-08-26 07:38:55',
105: '2021-08-26 07:38:56',
107: '2021-08-26 07:39:01',
109: '2021-08-26 07:39:08',
111: '2021-08-26 07:39:09',
116: '2021-08-26 07:39:50',
118: '2021-08-26 07:39:51',
120: '2021-08-26 07:39:54',
122: '2021-08-26 07:39:56',
123: '2021-08-26 07:39:56',
127: '2021-08-26 07:40:01',
146: '2021-08-26 07:43:04',
147: '2021-08-26 07:43:14',
149: '2021-08-26 07:43:22',
155: '2021-08-26 07:44:05',
157: '2021-08-26 07:44:06',
159: '2021-08-26 07:44:08',
161: '2021-08-26 07:44:11',
164: '2021-08-26 07:44:24',
166: '2021-08-26 07:44:26',
168: '2021-08-26 07:44:39',
170: '2021-08-26 07:44:40',
172: '2021-08-26 07:44:43',
174: '2021-08-26 07:44:44',
176: '2021-08-26 07:44:45',
178: '2021-08-26 07:44:47',
181: '2021-08-26 07:45:07',
184: '2021-08-26 07:45:27',
186: '2021-08-26 07:45:28',
188: '2021-08-26 07:45:34',
190: '2021-08-26 07:45:37',
192: '2021-08-26 07:45:42',
194: '2021-08-26 07:45:46',
196: '2021-08-26 07:45:51',
197: '2021-08-26 07:46:02',
199: '2021-08-26 07:46:06',
201: '2021-08-26 07:46:08',
203: '2021-08-26 07:46:09',
205: '2021-08-26 07:46:12',
207: '2021-08-26 07:46:17',
209: '2021-08-26 07:46:20',
213: '2021-08-26 07:46:52',
214: '2021-08-26 07:47:02',
215: '2021-08-26 07:47:13',
216: '2021-08-26 07:47:23',
217: '2021-08-26 07:47:33',
218: '2021-08-26 07:47:44',
219: '2021-08-26 07:47:54',
220: '2021-08-26 07:48:04',
221: '2021-08-26 07:48:15'},
'Signal': {1: 3,
2: 3,
4: 3,
9: 3,
10: 3,
11: 3,
12: 3,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 3,
21: 3,
22: 3,
23: 3,
24: 3,
26: 3,
28: 3,
31: 2,
33: 2,
38: 2,
40: 2,
42: 2,
45: 3,
51: 3,
54: 2,
56: 2,
60: 2,
62: 2,
64: 2,
66: 2,
68: 3,
74: 3,
76: 3,
78: 2,
80: 2,
84: 2,
86: 2,
89: 2,
91: 2,
92: 3,
94: 3,
95: 3,
96: 3,
97: 3,
98: 3,
101: 3,
103: 2,
105: 3,
107: 2,
109: 2,
111: 2,
116: 2,
118: 2,
120: 2,
122: 2,
123: 3,
127: 3,
146: 3,
147: 3,
149: 3,
155: 3,
157: 3,
159: 3,
161: 3,
164: 3,
166: 3,
168: 3,
170: 3,
172: 3,
174: 3,
176: 2,
178: 2,
181: 2,
184: 2,
186: 2,
188: 2,
190: 2,
192: 2,
194: 3,
196: 3,
197: 3,
199: 3,
201: 3,
203: 2,
205: 2,
207: 3,
209: 3,
213: 3,
214: 3,
215: 3,
216: 3,
217: 3,
218: 3,
219: 3,
220: 3,
221: 3}}
解决方案
假设您需要 3 个实际行,我调用reset_index()
了数据框。
# Reset the index (if you don't want this, logic for `seq_start` and `seq_end` will need to change)
df.reset_index(inplace=True)
# Accumulate sequences of interest
seqs = []
seq_start = None
for index,row in df.iterrows():
# Is the current row of interest?
of_interest = row.Signal == 2
if of_interest :
if seq_start is None :
# First interesting row in a potential sequence
seq_start = index
else:
# This row is no longer of interest, so see if a sequence has occurred...
if seq_start is not None :
# This row is not of interest. The sequence ended with previous row.
seq_end = index - 1
# Enough rows to be a sequence?
if seq_end - seq_start >= 3 :
seqs.append([seq_start, seq_end])
seq_start = None
print(f"Number of sequences of len >= 3 matching predicate = {len(seqs)}")
# Calculate the timedelta of a sequence, given its starting and ending time
def duration(start, end):
end_time = df.iloc[end,:].Time
start_time = df.iloc[start,:].Time
# parse form '2021-08-26 07:43:04' to datetime
end_time = datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S')
start_time = datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S')
return end_time - start_time
# Calculate duration of each sequence
durs = [duration(start, end) for start, end in seqs]
print(f"Total duration: {sum(durs, timedelta(0,0))}")
对于示例数据,打印:
Number of sequences of len >= 3 matching predicate = 5
Total duration: 0:03:52
推荐阅读
- ruby-on-rails - 提交后禁用功能规范检查表单按钮的问题
- amadeus - 无法在 Amadeus Flight Offers API 中使用 includedCheckedBags 属性
- ruby-on-rails - RVM - Ruby 更新 - 无法自动检测当前活动的 RVM gem 集名称
- angular - Angular 6 Error InvalidPipeArgument: '[object Object]' for pipe 'AsyncPipe' 尝试访问嵌套数组时
- c++ - 应用程序(有时)在 PlaySound() 上崩溃
- python - 动态更改 QLabel 文本不起作用
- node.js - 规划器返回错误:无法找到 $geoNear 查询的索引
- ansible - 使用 awx/ansible 中的 requirements.yml 为角色签出相同的分支
- performance - Spark 读取分区 - 资源成本分析
- javascript - 标记预订/保留的时间段