r - 循环内部与外部功能

问题描述

在对各种解决方案进行基准测试时，在这篇 SO 帖子中引起了讨论。考虑以下代码

# global environment is empty - new session just started
# set up
set.seed(20181231)
n <- sample(10^3:10^4,10^3)
for_loop <- function(n) {
  out <- integer(length(n))
  for(k in 1:length(out)) {
    if((k %% 2) == 0){
      out[k] <- 0L
      next
    }
    out[k] <- 1L
    next
  }
  out
}
# benchmarking
res <- microbenchmark::microbenchmark(
  for_loop = {
    out <- integer(length(n))
    for(k in 1:length(out)) {
      if((k %% 2) == 0){
        out[k] <- 0L
        next
      }
      out[k] <- 1L
      next
    }
    out
  },
  for_loop(n),
  times = 10^4
)

以下是完全相同的循环的基准测试结果，一个包含在函数中，另一个没有

# Unit: microseconds
#        expr      min       lq      mean   median       uq      max neval cld
#    for_loop 3216.773 3615.360 4120.3772 3759.771 4261.377 34388.95 10000   b
# for_loop(n)  162.280  180.149  225.8061  190.724  211.875 26991.58 10000  a 
ggplot2::autoplot(res)

可以看出，效率存在巨大差异。造成这种情况的根本原因是什么？

需要明确的是，问题不在于上述代码解决的任务（可以更优雅地完成），而仅仅是关于常规循环和包装在函数中的循环之间的效率差异。

标签： rmicrobenchmark

解释是函数是“即时”编译的，而解释代码不是。参见?compiler::enableJIT说明。

如果您想演示差异，请运行

compiler::enableJIT(0)

before any of your code (including the creation of the for_loop function). This disables JIT compiling for the rest of that session. Then the timing will be much more similar for the two sets of code.

You have to do this before the creation of the for_loop function, because once it gets compiled by the JIT compiler, it will stay compiled, whether JIT is enabled or not.

r - 循环内部与外部功能

问题描述

解决方案

推荐阅读