首页 > 解决方案 > data.table 大小和 datatable.alloccol 选项

问题描述

我正在处理的数据集不是很大,但很宽。我目前有 10 854 列,我想再添加大约 10/11k 列。它只有 760 行。

当我尝试(将函数应用于现有列的子集)时,我得到以下信息

Warning message:
In `[.data.table`(setDT(Final), , `:=`(c(paste0(vars, ".xy_diff"),  :
  truelength (30854) is greater than 10,000 items over-allocated (length = 10854). See ?truelength. If you didn't set the datatable.alloccol option very large, please report to data.table issue tracker including the result of sessionInfo().

我曾尝试与 setalloccol 一起玩,但我得到了类似的东西。例如:

setalloccol(Final, 40960)
Error in `[.data.table`(x, i, , ) : 
  getOption('datatable.alloccol') should be a number, by default 1024. But its type is 'language'.
In addition: Warning message:
In setalloccol(Final, 40960) :
  tl (51894) is greater than 10,000 items over-allocated (l = 21174). If you didn't set the datatable.alloccol option to be very large, please report to data.table issue tracker including the result of sessionInfo().

有没有办法绕过这个问题?

非常感谢

编辑:

回答 Roland 的评论,这就是我正在做的事情:

vars <- c(colnames(FinalTable_0)[271:290], colnames(FinalTable_0)[292:dim(FinalTable_0)[2]]) # <- variables I want to operate on
# FinalTable_0 is a previous table I use to collect the roots of the variables I want to work with
difference <- function(root) lapply(root, function(z) paste0("get('", z, ".x') - get('", z, ".y')"))
ratio <- function(root) lapply(root, function(z) paste0("get('", z, ".x') / get('", z, ".y')"))
# proceed to the computation
setDT(Final)[ , c(paste0(vars,".xy_diff"), paste0(vars,".xy_ratio")) := lapply(c(difference(vars), ratio(vars)), function(x) eval(parse(text = x)))]

标签: rdata.tablesize

解决方案


我尝试了罗兰提出的解决方案,但并不完全满意。它有效,但我不喜欢转置我的数据的想法。

最后,我只是将原始 data.table 拆分为多个,对每个单独进行计算并在最后加入。快速简单,无需玩变量,分辨哪些是 id,哪些是度量,无需塑造和重塑。我只是更喜欢。


推荐阅读