首页 > 解决方案 > 如何有效地计算 R 中的最短路径?

问题描述

我有 3500 多个起点和 3500 多个目的地,它们通过 54000 多个链接与 24000 个节点相连。我正在使用 Igraph 和 CppRouting 在 R 中为真实的街道网络(芝加哥都会区)建模。以下代码称为“全有或全无流量分配(AON)”,必须执行 40 次以上才能达到网络平衡。现在每次AON执行都需要 10 多分钟。时间太多了。除了并行计算之外,我感谢任何建议,以减少以下源代码的执行时间:

 demand_matrix <- demand_matrix[order(demand_matrix$ORG ,demand_matrix$DEST) ,]
  
  tic()
  for (i in 1:length(unique(demand_matrix$ORG))){
   #I think I have to iterate on every origin
    org <- unique(demand_matrix$ORG)[i]
    destinations <- demand_matrix$DEST[demand_matrix$ORG == org ] 
    demand <- demand_matrix[demand_matrix$ORG == org,2:3]

    #the igraph function is also included here which requires more time to run!
    #destinations <- demand_matrix$DEST[demand_matrix$ORG == org] 
    #sht_path <- unlist(shortest_paths(network_igraph,from =c (org) , to = c(destinations) , mode = c("out"), weights = resolved.Network[[5]]$t0,output = c("epath")),recursive = FALSE)
    #sht_path <- sapply(sht_path , as_ids)
    
#the procedures with cppRouting
    sht_path <- get_multi_paths(network_cpprouting_graph , from = org , to = destinations ,long = TRUE)
    sht_path$end <- c(sht_path$node[2:nrow(sht_path)],0)
    sht_path <-sht_path[sht_path$from != sht_path$node , ]
    sht_path$paste <- paste(sht_path$end , sht_path$node)
    edge_id_node_sequence <- as.integer(unlist(strsplit(sht_path$paste , split = " ")))
    sht_path$edge_ids <- get.edge.ids(network_igraph , edge_id_node_sequence)
    ###I changed the sequence of nodes to edge ids in shortest path.
    sht_path$to <- as.integer(sht_path$to) #I just found that "to" is character and changing it to integer would result lower time in left_join function
    sht_path <-left_join(sht_path , demand,by = c("to" = "DEST"))
    V2[sht_path$edge_ids] <- V2[sht_path$edge_ids] + sht_path$TRIPS #adding traffic to each link (that is what is all about, the goal is to calculate each link volume)
    
    }

需求矩阵有更多的 4e6 非零值,我尝试使用 get_path_pair 计算所有起点-终点对的最短路径,但它从未结束,我重新启动了笔记本电脑。我只有 8GB 的​​内存。我试图每次只有 8e5 对的最短路径(将我的矩阵分成 5 个部分)第三部分几乎从未结束。

  length_group <- min(nrow(demand_matrix)/4,800000)

  path_pair <- get_path_pair(Graph = test_net , from = demand_matrix$ORG[1:length_group],to = demand_matrix$DEST[1:length_group], long = TRUE)
  path_pair <- rbind(path_pair , get_path_pair(Graph = test_net , from = demand_matrix$ORG[(length_group+1):(2*length_group)],to = demand_matrix$DEST[(length_group+1):(2*length_group)],long = TRUE))
  path_pair <- rbind(path_pair , get_path_pair(Graph = test_net , from = demand_matrix$ORG[((2*length_group)+1):(3*length_group)],to = demand_matrix$DEST[((2*length_group)+1):(3*length_group)],long = TRUE))
  path_pair <- rbind(path_pair , get_path_pair(Graph = test_net , from = demand_matrix$ORG[((3*length_group)+1):(4*length_group)],to = demand_matrix$DEST[((3*length_group)+1):(4*length_group)],long = TRUE))
  path_pair <- rbind(path_pair , get_path_pair(Graph = test_net , from = demand_matrix$ORG[((4*length_group)+1):(5*length_group)],to = demand_matrix$DEST[((4*length_group)+1):(5*length_group)],long = TRUE))
  path_pair <- rbind(path_pair , get_path_pair(Graph = test_net , from = demand_matrix$ORG[((5*length_group)+1):nrow(demand_matrix)],to = demand_matrix$DEST[((5*length_group)+1):nrow(demand_matrix)],long = TRUE))

标签: rlarge-datashortest-path

解决方案


如果我理解正确,demand_matrix源节点和目标节点之间是否有所有可能的组合?(3500² = 12 250 000)

既然cppRouting函数是矢量化的,为什么不试试:

get_multi_path(graph, origin, dest, long=TRUE)

和你的起点origindest终点节点,长度约为 3500。 get_multi_pathget_distance_matrix函数的等价物,它使用 Dijkstra 算法的主要属性:找到起点节点“n”和所有节点之间的最短路径。因此,完整的 Dijkstra 算法运行 N 次,N 是原点长度。
另一方面,get_*_pair函数使用停止标准运行 Dijkstra 算法:到达目标节点时。所以你基本上将运行时间增加了约 1500 倍(不是 3500,因为 Dijkstra 的算法在最后一个选项中被中止)

如果您有内存问题,将所有组合分成更小的块是一个好策略。但是,我建议您将源节点拆分为 10,然后get_multi_path在源块和所有目标节点之间运行。在每次迭代中,您可以聚合结果以获得网络每个节点的累积流量。
最后,尝试使用lapply()anddata.table::rbindlist()而不是多次rbind()调用。

编辑:如果你想在边缘积累流量,这里有一段代码:

library(data.table)

# or are origin nodes (I assume of length 3500)
# dest are destination nodes

chunk_size = 350
test <- lapply(seq(1,3500, chunk_size), function(x){
  print(x)
  res = get_multi_paths(graph, or[x:(x+chunk_size-1)] ,
                          dest,
                          long = TRUE)
  setDT(res)
  # eventually merge demand for each trip (origin-destination)

  # reconstruct edges (by reference using data.table)
  res[,edge_from := c(node[-1], NA),.(from,to)]
  
  # aggregate demand on each edge
  res <- res[!is.na(edge_from),.(traffic = sum(demand)),.(edge_from,node)]
  gc()
  return(res)
})

test <- rbindlist(test)
test <- test[,.(traffic = sum(traffic)),.(edge_from,node)]

当然,您可以chunk_size根据可用内存进行修改。


推荐阅读