r - 是否可以优化此查找功能?(更快的 IF 语句)
问题描述
是否可以优化此代码?
在我的数据上运行一次大约需要 2 秒,并且由于我必须重复运行它,它为整个程序增加了相当长的时间。
此代码设置 2(f1,f2) 地理围栏并检查 node_coords 中的点是否在这些围栏之一内。结果,它生成了一个逻辑向量索引,可用于过滤 node_coords 并仅将那些位于这 2 个地理围栏之一中的点留在原处。
非常感谢您!安德烈亚斯
library("vctrs")
node_coords<-structure(list(lon = c(11.34175, 12.2063556, 12.2066937, 12.2068632,
12.2070187, 12.2078502), lat = c( 48.27649, 47.8399432, 47.8397677,
47.8396466, 47.8396952, 47.8395169)), row.names = c(172422L,
260117L, 147288L, 1337832L, 1850176L, 260151L), class = "data.frame")
check_if_point_is_within_geofence <- function(top, left, bottom, right, latitude, longitude){
# Check latitude bounds first.
if(top >= latitude && latitude >= bottom){
# If your bounding box doesn't wrap
# the date line the value
# must be between the bounds.
# If your bounding box does wrap the
# date line it only needs to be
# higher than the left bound or
# lower than the right bound.
if(left <= right && left <= longitude && longitude <= right){
return(TRUE)
} else if(left > right && (left <= longitude || longitude <= right)) {
return(TRUE)
}
}
return(FALSE)
}
geofence <- function(lon,lat){
f1 <- base::data.frame("left" = 11.34175, "bottom" = 47.98702 ,"right" = 11.77417 ,"top" = 48.27649)
f2 <- base::data.frame("left" = 12.10723, "bottom" = 47.84540, "right" = 12.15024, "top" = 47.87435 )
fences <- rbind.data.frame(f1,f2)
f_list <- apply(fences,1,function(x) check_if_point_is_within_geofence(top = x[4],left = x[1],bottom = x[2],right = x[3],latitude = lat,longitude = lon ) )
if (vec_in(TRUE,f_list))
return(TRUE)
return(FALSE)
}
index <- apply(cbind(node_coords$lon,node_coords$lat),1,function(x) geofence(x[1],x[2]) )
解决方案
这将是您的代码的优化版本:
vec_geofence <- function(top, left, bottom, right, lat, lon) {
# The mask vector represents whether a coordinate is seen in any of the
# fences defined by the top, left, bottom and right vectors. In the beginning
# all the coordinates haven't been tested, so the respective value in the
# mask vector is initialized as False.
mask <- rep(F, length(lon))
# For each fence...
for(i in seq_along(top)) {
# ... check for all the coordinates if they are inside of the fence
if( left[i] > right[i] )
new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon | lon <= right[i])
else
new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon & lon <= right[i])
# For all the coordinates that hadn't yet been seen in a fence, and that
# are inside the current fence, update the respective mask value to True
mask[!mask][new_mask] <- T
# The coordinates that will pass through to the next fence check are the ones
# that still haven't been seen inside a fence
lat <- lat[!new_mask]
lon <- lon[!new_mask]
}
mask
}
vec_geofence(fences$top, fences$left, fences$bottom, fences$right, node_coords$lat, node_coords$lon)
#> [1] TRUE FALSE FALSE FALSE FALSE FALSE
我改变了4个主要的东西:
- 将栅栏数据框移到地理围栏函数之外,因此每次运行该函数时都不会创建它
- 将 if 语句转换
check_if_point_is_within_geofence
为逻辑公式 - 将两个函数合并为一个,这样您就可以避免函数调用延迟
- 将单值逻辑公式转换为向量逻辑公式
此函数需要 5 秒来计算node_coords
具有 10 000 行的数据框和具有 10 000 行的数据框的地理围栏fences
。
node_coords_10k = do.call(rbind.data.frame, rep(list(node_coords), 10000/6))
fences_10k = do.call(rbind.data.frame, rep(list(fences), 10000/2))
system.time(vec_geofence(
fences_10k$top, fences_10k$left, fences_10k$bottom, fences_10k$right,
node_coords_10k$lat, node_coords_10k$lon
))
#> user system elapsed
#> 4.78 0.03 4.85
推荐阅读
- reactjs - 从构造函数与 componentDidMount 调用道具函数?
- docker - 无法从容器运行 docker 命令
- python-3.x - 我正在将参数传递给函数,但它在添加链表时仍然给出错误代码缺少参数
- c# - 选择列表 onchange 应预填充 Razor 页面中的文本字段值
- node.js - 弃用警告:不推荐使用 OutgoingMessage.prototype._headers
- ocaml - ocaml 中的 ('a * 'a) list 和 'a * 'a list 有什么区别?
- azure - 将 AppCenter 事件重新同步到 AppInsight
- excel - Excel VBA - 使用 3 个行元素和 1 个列元素从数据透视表中获取数据
- asp.net - 非托管 dll 调用函数的行为在控制台应用程序、带有 iis express 的 asp.net 应用程序和真实 iis 上的 asp.net 应用程序之间是不同的
- r - 根据 R 中的时间趋势对变量进行编码?