首页 > 解决方案 > 查找具有特定值的相邻行并获取时间差 python pandas

问题描述

此数据框包含两列。一种用于时间,一种用于信号。我想要做的是查找行,如果超过 3 个相邻行的信号等于 2,则获取该组的时间消耗。最后,我想获得这些组的数量和总时间消耗来计算每组的平均值

例子:

在此处输入图像描述

所以标记的右刻度有超过 3 个相邻行,包含信号 2。如果相邻信号 2 的数量小于或等于 3,我们不认为它是正确的。

例如:第一组时间消耗是该组的最后一个条目 - 第一个条目。所以应该是 (2021-08-26 07:34:23)-(2021-08-26 07:33:41)=42 秒。

因此,明智的做法最终需要了解这些组的数量及其总时间消耗。

数据:

{'Time': {1: '2021-08-26 07:30:23',
  2: '2021-08-26 07:30:32',
  4: '2021-08-26 07:30:37',
  9: '2021-08-26 07:30:44',
  10: '2021-08-26 07:30:55',
  11: '2021-08-26 07:31:05',
  12: '2021-08-26 07:31:15',
  13: '2021-08-26 07:31:26',
  14: '2021-08-26 07:31:36',
  15: '2021-08-26 07:31:47',
  16: '2021-08-26 07:31:57',
  17: '2021-08-26 07:32:07',
  18: '2021-08-26 07:32:18',
  19: '2021-08-26 07:32:28',
  20: '2021-08-26 07:32:38',
  21: '2021-08-26 07:32:49',
  22: '2021-08-26 07:32:59',
  23: '2021-08-26 07:33:09',
  24: '2021-08-26 07:33:20',
  26: '2021-08-26 07:33:27',
  28: '2021-08-26 07:33:31',
  31: '2021-08-26 07:33:41',
  33: '2021-08-26 07:33:47',
  38: '2021-08-26 07:34:20',
  40: '2021-08-26 07:34:22',
  42: '2021-08-26 07:34:23',
  45: '2021-08-26 07:34:25',
  51: '2021-08-26 07:35:23',
  54: '2021-08-26 07:35:33',
  56: '2021-08-26 07:35:34',
  60: '2021-08-26 07:35:57',
  62: '2021-08-26 07:35:59',
  64: '2021-08-26 07:36:00',
  66: '2021-08-26 07:36:01',
  68: '2021-08-26 07:36:02',
  74: '2021-08-26 07:37:02',
  76: '2021-08-26 07:37:06',
  78: '2021-08-26 07:37:07',
  80: '2021-08-26 07:37:09',
  84: '2021-08-26 07:37:39',
  86: '2021-08-26 07:37:43',
  89: '2021-08-26 07:37:55',
  91: '2021-08-26 07:37:57',
  92: '2021-08-26 07:37:58',
  94: '2021-08-26 07:37:59',
  95: '2021-08-26 07:38:09',
  96: '2021-08-26 07:38:20',
  97: '2021-08-26 07:38:30',
  98: '2021-08-26 07:38:40',
  101: '2021-08-26 07:38:53',
  103: '2021-08-26 07:38:55',
  105: '2021-08-26 07:38:56',
  107: '2021-08-26 07:39:01',
  109: '2021-08-26 07:39:08',
  111: '2021-08-26 07:39:09',
  116: '2021-08-26 07:39:50',
  118: '2021-08-26 07:39:51',
  120: '2021-08-26 07:39:54',
  122: '2021-08-26 07:39:56',
  123: '2021-08-26 07:39:56',
  127: '2021-08-26 07:40:01',
  146: '2021-08-26 07:43:04',
  147: '2021-08-26 07:43:14',
  149: '2021-08-26 07:43:22',
  155: '2021-08-26 07:44:05',
  157: '2021-08-26 07:44:06',
  159: '2021-08-26 07:44:08',
  161: '2021-08-26 07:44:11',
  164: '2021-08-26 07:44:24',
  166: '2021-08-26 07:44:26',
  168: '2021-08-26 07:44:39',
  170: '2021-08-26 07:44:40',
  172: '2021-08-26 07:44:43',
  174: '2021-08-26 07:44:44',
  176: '2021-08-26 07:44:45',
  178: '2021-08-26 07:44:47',
  181: '2021-08-26 07:45:07',
  184: '2021-08-26 07:45:27',
  186: '2021-08-26 07:45:28',
  188: '2021-08-26 07:45:34',
  190: '2021-08-26 07:45:37',
  192: '2021-08-26 07:45:42',
  194: '2021-08-26 07:45:46',
  196: '2021-08-26 07:45:51',
  197: '2021-08-26 07:46:02',
  199: '2021-08-26 07:46:06',
  201: '2021-08-26 07:46:08',
  203: '2021-08-26 07:46:09',
  205: '2021-08-26 07:46:12',
  207: '2021-08-26 07:46:17',
  209: '2021-08-26 07:46:20',
  213: '2021-08-26 07:46:52',
  214: '2021-08-26 07:47:02',
  215: '2021-08-26 07:47:13',
  216: '2021-08-26 07:47:23',
  217: '2021-08-26 07:47:33',
  218: '2021-08-26 07:47:44',
  219: '2021-08-26 07:47:54',
  220: '2021-08-26 07:48:04',
  221: '2021-08-26 07:48:15'},
 'Signal': {1: 3,
  2: 3,
  4: 3,
  9: 3,
  10: 3,
  11: 3,
  12: 3,
  13: 3,
  14: 3,
  15: 3,
  16: 3,
  17: 3,
  18: 3,
  19: 3,
  20: 3,
  21: 3,
  22: 3,
  23: 3,
  24: 3,
  26: 3,
  28: 3,
  31: 2,
  33: 2,
  38: 2,
  40: 2,
  42: 2,
  45: 3,
  51: 3,
  54: 2,
  56: 2,
  60: 2,
  62: 2,
  64: 2,
  66: 2,
  68: 3,
  74: 3,
  76: 3,
  78: 2,
  80: 2,
  84: 2,
  86: 2,
  89: 2,
  91: 2,
  92: 3,
  94: 3,
  95: 3,
  96: 3,
  97: 3,
  98: 3,
  101: 3,
  103: 2,
  105: 3,
  107: 2,
  109: 2,
  111: 2,
  116: 2,
  118: 2,
  120: 2,
  122: 2,
  123: 3,
  127: 3,
  146: 3,
  147: 3,
  149: 3,
  155: 3,
  157: 3,
  159: 3,
  161: 3,
  164: 3,
  166: 3,
  168: 3,
  170: 3,
  172: 3,
  174: 3,
  176: 2,
  178: 2,
  181: 2,
  184: 2,
  186: 2,
  188: 2,
  190: 2,
  192: 2,
  194: 3,
  196: 3,
  197: 3,
  199: 3,
  201: 3,
  203: 2,
  205: 2,
  207: 3,
  209: 3,
  213: 3,
  214: 3,
  215: 3,
  216: 3,
  217: 3,
  218: 3,
  219: 3,
  220: 3,
  221: 3}}

标签: pythonpandasdataframeconditional-statementspandas-groupby

解决方案


假设您需要 3 个实际行,我调用reset_index()了数据框。

# Reset the index (if you don't want this, logic for `seq_start` and `seq_end` will need to change)
df.reset_index(inplace=True)

# Accumulate sequences of interest
seqs = []
seq_start = None

for index,row in df.iterrows():
    # Is the current row of interest? 
    of_interest = row.Signal == 2
    if of_interest :
        if seq_start is None :
            # First interesting row in a potential sequence
            seq_start = index
    else:
        # This row is no longer of interest, so see if a sequence has occurred...
        if seq_start is not None :
            # This row is not of interest. The sequence ended with previous row.
            seq_end = index - 1
            # Enough rows to be a sequence?
            if seq_end - seq_start >= 3 :
                seqs.append([seq_start, seq_end])
        seq_start = None

print(f"Number of sequences of len >= 3 matching predicate = {len(seqs)}")

# Calculate the timedelta of a sequence, given its starting and ending time
def duration(start, end):
    end_time = df.iloc[end,:].Time
    start_time = df.iloc[start,:].Time
    # parse form '2021-08-26 07:43:04' to datetime
    end_time = datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S')
    start_time = datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S')
    return end_time - start_time

# Calculate duration of each sequence
durs = [duration(start, end) for start, end in seqs]

print(f"Total duration: {sum(durs, timedelta(0,0))}") 

对于示例数据,打印:

Number of sequences of len >= 3 matching predicate = 5
Total duration: 0:03:52

推荐阅读