首页 > 解决方案 > 在大矩阵中计算欧几里得距离的最有效方法

问题描述

我想找到在大矩阵上计算欧几里德距离的最节省内存和时间的方法。我在下面运行了这个小基准测试,比较了我知道的几个包:parallelDistgeodist和. 我也考虑过这个结合了和的自定义函数。这是我找到的结果(下面的代表),但我想知道是否有其他有效的 pacakges / 解决方案来完成这项任务:fieldsstatsRcppbigmemory

结果

benchmrk
#>   package   time        alloc
#>1: parDist  0.298 5.369186e-04
#>2:  fields  1.079 9.486198e-03
#>3:    rcpp 54.422 2.161113e+00
#>4:   stats  0.770 5.788603e+01
#>5: geodist  2.513 1.157635e+02

# plot
ggplot(benchmrk, aes(x=alloc , y=time, color= package, label=package)) +
  geom_label(alpha=.5) +
  coord_trans(x="log10", y="log10") +
  theme(legend.position = "none")

在此处输入图像描述

代表

library(parallelDist)
library(geodist)
library(fields)
library(stats)
library(bigmemory)
library(Rcpp)

library(lineprof)
library(geobr)
library(sf)
library(ggplot2)
library(data.table)


# data input
df <- geobr::read_weighting_area()
gc(reset = T)

# convert projection to UTM
df <- st_transform(df, crs = 3857)

# get spatial coordinates
coords <- suppressWarnings(st_coordinates( st_centroid(df) ))

# prepare customized rcpp function
sourceCpp("euc_dist.cpp")

bigMatrixEuc <- function(bigMat){
  zeros <- big.matrix(nrow = nrow(bigMat)-1,
                      ncol = nrow(bigMat)-1,
                      init = 0,
                      type = typeof(bigMat))
  BigArmaEuc(bigMat@address, zeros@address)
  return(zeros)
}




### Start tests
perf_fields  <- lineprof(dist_fields <- fields::rdist(coords) )
perf_geodist <- lineprof(dist_geodist <- geodist::geodist(coords, measure = "cheap") )
perf_stats   <- lineprof(dist_stats <- stats::dist(coords) )
perf_parDist <- lineprof(dist_parDist <- parallelDist::parDist(coords, method = "euclidean") )
perf_rcpp <- lineprof(dist_rcpp <- bigMatrixEuc( as.big.matrix(coords) ) )

perf_fields$package  <- 'fields'
perf_geodist$package <- 'geodist'
perf_stats$package   <- 'stats'
perf_parDist$package <- 'parDist'
perf_rcpp$package <- 'rcpp'


# gather results
benchmrk <- rbind(perf_fields, perf_geodist, perf_stats , perf_parDist, perf_rcpp)
benchmrk <- setDT(benchmrk)[, .(time  =sum(time), alloc = sum(alloc)), by=package][order(alloc)]
benchmrk

标签: rmatrixgeospatialdistancesf

解决方案


推荐阅读