首页 > 解决方案 > 查找函数调用的物种特定系数的有效方法

问题描述

Andrew Robinson 在irebreakeR中展示了如何使用直径和高度计算树的体积。他创建了一个函数,该函数使用取决于种类直径的系数。简化版本如下所示:

funRobinson <- function(species, diameter, height) {
  bf_params <- data.frame(species  = c("Spruce", "Oak"),
                          b0_small = c(26.729,  29.790),
                          b1_small = c( 0.01189, 0.00997),
                          b0_large = c(32.516,  85.150),
                          b1_large = c( 0.01181, 0.00841))
  dimensions <- data.frame(diameter   = diameter,
                           height     = height,
                           species    = as.character(species),
                           this_order = 1:length(species))
  dimensions <- merge(y=dimensions, x=bf_params, all.y=TRUE, all.x=FALSE)
  dimensions <- dimensions[order(dimensions$this_order, decreasing=FALSE),]
  b0 <- with(dimensions, ifelse(diameter <= 20.5, b0_small, b0_large))
  b1 <- with(dimensions, ifelse(diameter <= 20.5, b1_small, b1_large))
  b0 + b1 * dimensions$diameter^2 * dimensions$height
}

对我来说,这种方法看起来很简单,但它会创建一个额外data.frame需要排序的额外调用,并调用ifelse两次以区分小树 ( diameter <= 20.5) 和大树。我正在寻找一种更有效的方法(低内存消耗执行时间)来查找特定物种的系数。我很欣赏在不编辑函数的情况下为其他物种添加系数的可能性。

示例数据集和性能:

dat <- data.frame(species = c("Spruce", "Spruce", "Oak", "Oak", "Fir"),
                  diameter = c(4,   30,  4,   30,  30),
                  height  = c(30,  100, 30,  100, 100))
with(dat, funRobinson(species, diameter, height))
#[1]   32.4362 1095.4160   34.5756  842.0500        NA

library(microbenchmark)
microbenchmark(
  Robinson = with(dat, funRobinson(species, diameter, height))
)
#Unit: milliseconds
#     expr      min       lq     mean   median       uq      max neval
# Robinson 1.832604 1.860334 1.948054 1.876155 1.905009 3.054021   100


set.seed(0)
size <- 1e5
dat2 <- data.frame(species = sample(c("Spruce", "Oak", "Fir"), size=size, replace = TRUE)
       , diameter = runif(size, 1, 50)
       , height  = runif(size, 1, 100))

microbenchmark(
  Robinson = with(dat2, funRobinson(species, diameter, height))
)
#Unit: milliseconds
#     expr      min       lq     mean   median       uq      max neval
# Robinson 203.8171 219.9265 234.0798 227.5911 250.6204 278.9918   100

标签: rperformancefunctionparameterscoefficients

解决方案


我猜它是在避免使用数据框,而是直接从向量(或矩阵)中调用值。而且b0和b1调用的值是一样的,所以我们只需要计算一次。

下面是一个快速尝试,很可能它可以做得更快。我基本上为每个参数制作了2个矩阵,并调出相应的行和列,根据

f2 <- function(species, diameter, height) {
  species_avail=c("Spruce", "Oak")
  params_b0 = cbind(b0_small = c(26.729,  29.790),
                    b0_large = c(32.516,  85.150))
  rownames(params_b0) = species_avail
  params_b1 = cbind(b1_small = c( 0.01189, 0.00997),
                    b1_large = c( 0.01181, 0.00841))
  rownames(params_b1) = species_avail
  ROWS = match(species,species_avail)
  COLS = +(diameter > 20.5) + 1
  idx = cbind(ROWS,COLS)
  b0 <- params_b0[idx]
  b1 <- params_b1[idx]

  b0 + b1 * diameter^2 * height
}

创建数据:

set.seed(0)
size <- 1e5
dat2 <- data.frame(species = sample(c("Spruce", "Oak", "Fir"), size=size, replace = TRUE)
                   , diameter = runif(size, 1, 50)
                   , height  = runif(size, 1, 100))

检查函数返回相同的东西:

identical(
with(dat2,funRobinson(species, diameter, height)),
with(dat2,f2(species,diameter,height))
)
[1] TRUE

测试:

microbenchmark(
  Robinson = with(dat2, funRobinson(species, diameter, height)),
  f2 = with(dat2, f2(species, diameter, height))
)

Unit: milliseconds
     expr        min        lq      mean    median        uq       max neval
 Robinson 249.677157 275.23375 303.97532 298.72475 329.04318 391.53807   100
       f2   9.423471  10.16365  13.86918  10.48073  16.06827  65.19541   100
 cld
   b
  a 

推荐阅读