首页 > 解决方案 > 是否可以优化此查找功能?(更快的 IF 语句)

问题描述

是否可以优化此代码?

在我的数据上运行一次大约需要 2 秒,并且由于我必须重复运行它,它为整个程序增加了相当长的时间。

此代码设置 2(f1,f2) 地理围栏并检查 node_coords 中的点是否在这些围栏之一内。结果,它生成了一个逻辑向量索引,可用于过滤 node_coords 并仅将那些位于这 2 个地理围栏之一中的点留在原处。

非常感谢您!安德烈亚斯

library("vctrs")



node_coords<-structure(list(lon = c(11.34175, 12.2063556, 12.2066937, 12.2068632, 
12.2070187, 12.2078502), lat = c( 48.27649, 47.8399432, 47.8397677, 
47.8396466, 47.8396952, 47.8395169)), row.names = c(172422L,
260117L, 147288L, 1337832L, 1850176L, 260151L), class = "data.frame")


check_if_point_is_within_geofence <- function(top, left, bottom, right, latitude, longitude){
  # Check latitude bounds first.
  if(top >= latitude && latitude >= bottom){
    # If your bounding box doesn't wrap 
    #              the date line the value
    #               must be between the bounds.
    #               If your bounding box does wrap the 
    #               date line it only needs to be  
    #               higher than the left bound or 
    #               lower than the right bound. 
    if(left <= right && left <= longitude && longitude <= right){
      return(TRUE)
    } else if(left > right && (left <= longitude || longitude <= right)) {
      return(TRUE) 
    }
  }
  return(FALSE)
}

geofence <- function(lon,lat){
  f1 <- base::data.frame("left" = 11.34175, "bottom" = 47.98702 ,"right" = 11.77417 ,"top" = 48.27649)
  f2 <- base::data.frame("left" = 12.10723, "bottom" = 47.84540, "right" = 12.15024, "top" = 47.87435 )
  
  fences <- rbind.data.frame(f1,f2)
  f_list <- apply(fences,1,function(x)  check_if_point_is_within_geofence(top = x[4],left = x[1],bottom = x[2],right = x[3],latitude = lat,longitude = lon ) )
  
  if (vec_in(TRUE,f_list))
    return(TRUE)
  return(FALSE)
}

index <- apply(cbind(node_coords$lon,node_coords$lat),1,function(x)  geofence(x[1],x[2]) )

标签: rdplyrtidyr

解决方案


这将是您的代码的优化版本:

vec_geofence <- function(top, left, bottom, right, lat, lon) {

  # The mask vector represents whether a coordinate is seen in any of the
  #   fences defined by the top, left, bottom and right vectors. In the beginning
  #   all the coordinates haven't been tested, so the respective value in the
  #   mask vector is initialized as False.
  mask <- rep(F, length(lon))
  
  # For each fence...
  for(i in seq_along(top)) {

    # ... check for all the coordinates if they are inside of the fence
    if( left[i] > right[i] )
      new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon | lon <= right[i])
    else
      new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon & lon <= right[i])
    
    # For all the coordinates that hadn't yet been seen in a fence, and that
    #   are inside the current fence, update the respective mask value to True
    mask[!mask][new_mask] <- T

    # The coordinates that will pass through to the next fence check are the ones
    #   that still haven't been seen inside a fence
    lat <- lat[!new_mask]
    lon <- lon[!new_mask]
  }
  
  mask
}

vec_geofence(fences$top, fences$left, fences$bottom, fences$right, node_coords$lat, node_coords$lon)
#> [1]  TRUE FALSE FALSE FALSE FALSE FALSE

我改变了4个主要的东西:

  1. 将栅栏数据框移到地理围栏函数之外,因此每次运行该函数时都不会创建它
  2. 将 if 语句转换check_if_point_is_within_geofence为逻辑公式
  3. 将两个函数合并为一个,这样您就可以避免函数调用延迟
  4. 将单值逻辑公式转换为向量逻辑公式

此函数需要 5 秒来计算node_coords具有 10 000 行的数据框和具有 10 000 行的数据框的地理围栏fences

node_coords_10k = do.call(rbind.data.frame, rep(list(node_coords), 10000/6))
fences_10k = do.call(rbind.data.frame, rep(list(fences), 10000/2))

system.time(vec_geofence(
    fences_10k$top, fences_10k$left, fences_10k$bottom, fences_10k$right, 
    node_coords_10k$lat, node_coords_10k$lon
))
#> user  system elapsed 
#> 4.78    0.03    4.85 

推荐阅读