首页 > 解决方案 > 从数据数组中过滤掉重叠数据

问题描述

我有一个这样的数据数组。

---------------------------------------
| Name | Start time | End time | Count|
---------------------------------------
|  A   | 5:00       | 5:30     | 10   |
|  B   | 5:00       | 5:45     | 20   |
|  C   | 5:36       | 5:50     | 30   |
|  D   | 5:43       | 5:55     | 40   |
|  E   | 5:56       | 6:00     | 50   |
--------------------------------------- 

我想要做 :

1. Sum of Count of A and B since they are both overlapping. 
2. Sum of Count of B , C and D since they are both overlapping.
3. I want to store E in separate array since it is not overlapping with anyone.

输出 2 个数组:-

Overlap Array
A, B => 30
B,C,D => 90

Non - Overlapping array
E => 50 

标签: python

解决方案


一个简单的方法是这样的:

for row in l:
  row['overlapping_count'] = 0
  for otherrow in l:
    if (row['start'] <= otherrow['start'] <= row['end']) or (row['start'] <= otherrow['end'] <= row['end']):
      row['overlapping_count'] += otherrow['count']

这将计算每一行的重叠计数。但是请注意,这是一个 O(n^2) 算法,并且随着数据量的增加它不会很好地扩展。


推荐阅读