python - How to optimally compare if element is in a list of ranges?
问题描述
I have two 2D lists with one million+ entries. List1
is full of ranges (e.g. [100, 25]
meaning the range starts at 100
and has a span of 25
, or 100
to 125
) and List2
is full of plot points. I have to count how many ranges each point fits into.
Since I'm dealing with such a large amount of data, I began with a binary search function to make searching for a starting point easier, rather than cycling through the whole list one by one. Except I've modified the search to return the closest index from List1
that is ABOVE my plot point, because the plot point is not guaranteed to be an element in List1
.
I end up doing a comparison between each plot point in List2
and List1
, starting at the index I got from the Binary Search, moving downwards in List1
until the plot point is no longer within a range. Although this works, this is very time consuming and inefficient.
def findRange(List1, List2):
for plot in List2:
count = 0
startIndex = binarySearch(List1, plot)
while plot <= List1[startIndex - 1][0] + 250:
if List1[startIndex][0] < plot[0] < List1[startIndex][0] + List1[startIndex][1]:
count += 1
startIndex -= 1
plot[1] = count
return List2
I have the ... + 250
in the while
loop because the range distance can be anywhere from 0
to 250
, so I use that to determine whether or not further indices might contain the point.
I've looked at converting List1
into a dictionary where the key is the summed distance, and the value is the number of repeats found. I thought that by getting rid of repeating ranges, I could cut back on some of the overhead. I couldn't find a way to get the correct solution this way though.
Is there any suggestions you can direct me in to better optimize my algorithm? I can only use the standard library.
EDIT:
List1 = [
[1233, 120],
[1233, 80],
[1490, 50],
[1789, 220],
[1800, 250]
]
List2 = [
[1300, 0],
[1450, 0],
[1490, 0],
[2000, 0]
]
I am supposed to update the zeros in List2
with the count of how many ranges the plot point falls into. So an output would be something like this:
List2 = [
[1300, 2], # this point can be found in List1[0] and List1[1] because 1300 is between 1233-1353 and 1233-1313
[1450, 0], # this plot can be found in none of ranges
[1490, 1], # this plot can be found in List1[2]
[2000, 2] # this plot can be found in List1[2] and List1[3]
]
As you can see, the count index is updated in List2
from 0 to the number of ranges contains the point.
解决方案
版本 1(二分查找)
您最好搜索List2
元素List1
而不是相反。假设两个列表都已排序,您总是会缩小搜索空间。此外,您将避免在假设范围大小上限时必须执行的潜在浪费操作。
附带说明一下,二进制搜索是在bisect
模块中的 python 中实现的。我建议同时使用bisect_left
(用于范围开始)和bisect_right
范围结束。
from bisect import bisect_left, bisect_right
start = 0
for rnge in List1:
start = bisect_left(List2, [rnge[0], 0], start)
end = bisect_right(List2, [rnge[0] + rnge[1] - 1, 0], start)
for plot_index in range(start, end):
List2[plot_index][1] += 1
您的示例的结果是
>>> List2
[[1300, 2], [1450, 0], [1490, 1], [2000, 2]]
版本 2(一次性线性搜索)
话虽如此,一个更好的解决方案浮现在脑海中。您可以一次通过两个列表同时完成整个操作。实现这一点有两个想法。
首先,考虑如果您有两个可以查看的迭代器会发生什么。您始终可以检查哪个更大,哪个更小,并增加适当的一个,直到另一个的头部更大或更小。
其次,您可以使用 List1,将每个范围的开始和结束都转换为索引,并对所有这些索引进行排序。但是,您仍然必须跟踪哪些索引表示进入范围,哪些表示退出范围。事实上,您可以想象输入一个范围会增加应该分配给您在其中遇到的所有图的数字,而退出一个范围会减少该数字。
让我们从List1
使用第二个想法进行转换开始。它将成为一个对列表,其中第一个元素是索引,第二个元素是 +1 或 -1,具体取决于索引是开始还是结束:
actions = []
for rnge in List1:
actions.append([rnge[0], +1])
actions.append([rnge[0] + rnge[1], -1])
actions.sort()
现在您可以沿着两个列表 (action
和List2
) 前进,跟踪从 的增量的运行总和action
。如果元素 fromList2
的键小于 from action
,则将当前增量分配给 fromList2
并继续。如果一个 fromaction
有一个较小或相等的键,您可以通过更新增量来处理它(即,进入或退出一个范围):
end = (List2[-1] + 1, 0) # Key to ensure that increment stops changing once actions runs out
it = iter(actions)
key, inc = next(it, end)
increment = 0
for plot in List2: # Keep plot as a list, since it needs to be incremented
while key <= plot[0]:
increment += inc
key, inc = next(it, end)
plot[1] += increment
虽然这种方法有两个部分,但请注意它只进行两次线性传递List1
和一次传递List2
。没有实际的搜索,只有比较。该调用next(it, end)
确保所有的List2
都得到完全处理,无论如何不增加iterator
一次actions
已经用完。
和以前一样,结果是
>>> List2
[[1300, 2], [1450, 0], [1490, 1], [2000, 2]]
推荐阅读
- c# - 如何通过在移动设备上触摸来更改对象属性 (UNITY) (2D)
- ios - 如何使用 AVFoundation (>1 min) 录制长视频?
- .net - 只知道错误地址时如何查找源代码
- docker - 如何使用存储在卷中的文件创建新容器
- javascript - 将 JavaScript 变量注入 JSON 文件 (Node.js)
- javascript - 重定向到某个页面然后执行一个事件或动作
- python - 如何在 python 中使用 for 循环创建多个数据框
- sql-server - 输出具有多个属于不同项目组配置的销售项目的 SQL 单个事务
- c++ - 什么保证两个不相关线程中的不同不相关对象没有(不可避免的)竞争条件?
- java - 排序坐标系