python - 给定边,如何以向量化的方式找到由两条边组成的路线?
问题描述
我有一系列城镇及其邻居。我想获得一组所有具有至少一条由两条不同边组成的路线的城镇对。有没有一种矢量化的方式来做到这一点?如果不是,为什么?例如:edges[3,0], [0,4], [5,0]
有一个事件节点 0,因此[3,4], [4,5], [3,5]
可以确定可以在路线中连接的城镇对,例如3-0-4
:4-0-5
和3-0-5
。它们由两条边组成。
输入示例:np.array([[3,0], [0,4], [5,0], [2,1], [1,4], [2,3], [5,2]])
预期输出:(array([4,3], [4,5], [3,5], [4,2], [1,3], [1,5], [3,5], [0,2], [0,1], [0,2])
不用担心顺序不同,任何边缘方向反转或有重复)
到目前为止我已经做了:
from itertools import chain, combinations
def get_incidences(roads):
roads = np.vstack([roads, roads[:,::-1]])
roads_sorted = roads[np.argsort(roads[:,0])]
marker_idx = np.flatnonzero(np.diff(roads_sorted[:,0]))+1
source = roads_sorted[np.r_[marker_idx-1,-1],0]
target = np.split(roads_sorted[:,1], marker_idx)
return source, target
def get_combinations_chain(target):
#I know this could be improved with `np.fromiter`
return np.array(list(chain(*[combinations(n,2) for n in target])))
def get_combinations_triu(target):
def combs(t):
x, y = np.triu_indices(len(t),1)
return np.transpose(np.array([t[x], t[y]]))
return np.concatenate([combs(n) for n in target])
roads = np.array([[3,0], [0,4], [5,0], [2,1], [1,4], [2,3], [5,2]])
>>> get_incidences(roads)
(array([0, 1, 2, 3, 4, 5]),
[array([4, 3, 5]),
array([4, 2]),
array([1, 3, 5]),
array([0, 2]),
array([0, 1]),
array([0, 2])])
>>> get_combinations_chain(get_incidences(roads)[1])
array([[4, 3], [4, 5], [3, 5], [4, 2], [1, 3], [1, 5], [3, 5], [0, 2], [0, 1], [0, 2]])
>>> get_combinations_triu(get_incidences(roads)[1])
array([[4, 3], [4, 5], [3, 5], [4, 2], [1, 3], [1, 5], [3, 5], [0, 2], [0, 1], [0, 2]])
最后两个给出了预期的输出,但它们需要列表理解。是否可以将此计算向量化:
np.concatenate([combs(n) for n in target])
更新我以一种可能的矢量化方式结束,但我需要重新组织输入数据(的输出get_incidences
):
INPUT:
target: [array([4, 3, 5]), array([4, 2]), array([1, 3, 5]), array([0, 2]), array([0, 1]), array([0, 2])]
stream: [4 3 5 4 2 1 3 5 0 2 0 1 0 2]
lengths: [3 2 3 2 2 2]
OUTPUT:
array([[3, 4], [4, 5], [3, 5], [2, 4], [1, 3], [1, 5], [3, 5], [0, 2], [0, 1], [0, 2]])
它似乎也比所有组合的直接连接更快:
def get_incidences(roads):
roads = np.vstack([roads, roads[:,::-1]])
roads_sorted = roads[np.argsort(roads[:,0])]
marker_idx = np.flatnonzero(np.diff(roads_sorted[:,0]))+1
lengths = np.diff(marker_idx, prepend=0, append=len(roads_sorted))
stream = roads_sorted[:,1]
target = np.split(stream, marker_idx)
return target, stream, lengths
def get_combinations_vectorized(data):
target, stream, lengths = data
idx1 = np.concatenate(np.repeat(target, lengths))
idx2 = np.repeat(stream, np.repeat(lengths, lengths))
return np.array([idx1, idx2]).T[idx1 < idx2]
def get_combinations_triu(data):
target, stream, lengths = data
def combs(t):
x, y = np.triu_indices(len(t),1)
return np.transpose(np.array([t[x], t[y]]))
return np.concatenate([combs(n) for n in target])
def get_combinations_chain(data):
target, stream, lengths = data
return np.array(list(chain(*[combinations(n,2) for n in target])))
def get_combinations_scott(data):
target, stream, lengths = data
return np.array([x for i in target for x in combinations(i,2)])
def get_combinations_index(data):
target, stream, lengths = data
index = np.fromiter(chain.from_iterable(chain.from_iterable(combinations(n,2) for n in target)),
dtype=int, count=np.sum(lengths*(lengths-1)))
return index.reshape(-1,2)
roads = np.array([[64, 53], [94, 90], [24, 60], [45, 44], [83, 17], [10, 88], [14, 6], [56, 93], [98, 93], [86, 77], [12, 85], [58, 2], [19, 80], [48, 26], [11, 51], [16, 83], [45, 96], [35, 54], [47, 23], [81, 57], [52, 34], [88, 11], [18, 4], [35, 90], [41, 45], [2, 7], [58, 68], [58, 11], [46, 38], [32, 93], [44, 41], [26, 39], [20, 58], [44, 4], [8, 96], [74, 71], [34, 35], [91, 72], [28, 58], [53, 73], [66, 5], [84, 97], [24, 29], [43, 63], [96, 63], [20, 57], [1, 74], [4, 89], [10, 89], [98, 22]])
data = get_incidences(roads)
%timeit get_combinations_vectorized(data)
%timeit get_combinations_chain(data)
%timeit get_combinations_triu(data)
%timeit get_combinations_scott(data)
%timeit get_combinations_index(data)
92 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
123 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.8 ms ± 9.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
126 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
140 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
但是,这在很大程度上取决于数据。时间安排roads = np.array(list(combinations(range(100),2)))
44.2 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
277 ms ± 8.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
21.2 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
369 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
43.2 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
解决方案
您可以使用 networkx 库:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from itertools import combinations
a = np.array([[3,0], [0,4], [5,0], [2,1], [1,4], [2,3], [5,2]])
G = nx.Graph()
G.add_edges_from(a)
#Creates this newtork
nx.draw_networkx(G)
# Create pairs of all nodes in network
c = combinations(G.nodes, 2)
# Find all routes between each pair in the network
routes = [list(nx.all_simple_paths(G, i, j, cutoff=2))[0] for i, j in c]
# Select only routes with three nodes/two edges the show first and last node
paths_2_edges = [(i[0], i[-1]) for i in routes if len(i) == 3]
print(paths_2_edges)
输出:
[(3, 4), (3, 5), (3, 1), (0, 2), (0, 1), (4, 5), (4, 2), (5, 1)]
根据评论
将此语句向量化np.concatenate([combs(n) for n in target])
:
为了t = get_incidences(roads)[1]
s2 = get_combinations_triu(t)
输出 s2:
array([[4, 3],
[4, 5],
[3, 5],
[4, 2],
[1, 3],
[1, 5],
[3, 5],
[0, 2],
[0, 1],
[0, 2]])
%timeit get_combinations_triu(t)
每个循环 96.9 µs ± 3.44 µs(平均值 ± 标准偏差。7 次运行,每次 10000 次循环)
然后
s1 = np.array([x for i in t for x in combinations(i,2)])
输出 s1:
array([[4, 3],
[4, 5],
[3, 5],
[4, 2],
[1, 3],
[1, 5],
[3, 5],
[0, 2],
[0, 1],
[0, 2]])
而且, (s1 == s2).all()
True
时间:
%timeit np.array([x for i in t for x in list(combinations(i,2))])
每个循环 14.7 µs ± 577 ns(平均值 ± 标准偏差。7 次运行,每次 100000 次循环)
推荐阅读
- html - 如何在输入文本中添加搜索和麦克风按钮?
- c# - 编写构造函数来初始化属性 csharp?
- video - 视频静音时,MediaRecorder 在 Chrome 中给出 0 个字节
- javascript - Web API 是否在其他线程中运行?
- hive - Impala/Hive 函数获取字符串的子字符串
- html - 将数据传递给 Blazor 中的模式
- java - 在构建 Maven Web 应用程序时创建一个 zip 文件并将该文件移动到生成的 .war 文件
- reactjs - 错误:对象作为 React 子项无效。如果您打算渲染一组子项,请使用数组而不是使用 Reactjs
- python - Telegram API 偶尔会在请求中抛出与 SSL 相关的异常
- graphql - Gatsby:如何列出所有节点?