首页 > 解决方案 > data.table 和序列化的奇怪行为

问题描述

这是一个简单的、可重现的示例:

fobj1 <- function(a, b) {
  list(a)
}

make1 <- function() {
  data <- data.table::data.table(1:1e8)
  a <- 1; b <- 2
  fobj1(a, b)
}

tmp <- make1()
print(object.size(serialize(tmp, connection = NULL)), units = 'Kb')



fobj2 <- function(a, b) {
  f <- function() {NULL}
  list(a, b, 'f' = f)
}

make2 <- function() {
  data <- data.table::data.table(1:1e8)
  a <- 1; b <- 2
  fobj2(a, b)
}

tmp <- make2()
print(object.size(serialize(tmp, connection = NULL)), units = 'Kb')


fobj3 <- function(a, b) {
  f <- function() {NULL}
  list(a, 'f' = f)
}

make3 <- function() {
  data <- data.table::data.table(1:1e8)
  a <- 1; b <- 2
  fobj3(a, b)
}

tmp <- make3()
print(object.size(serialize(tmp, connection = NULL)), units = 'Kb')

如果我来源这个,结果是:

0.1 Kb
22.6 Kb
390647.9 Kb

显然,data以某种方式作为list上一个示例中的参考。原因是我没有在list()! 诡异的?!?

有人可以复制和解释吗?

data.table版本:data.table_1.12.4

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

标签: rserializationdata.table

解决方案


推荐阅读