r - 使用 `data.table` 的 DT[ i , j, by] 时,是否可以预先设置列类型?
问题描述
我正在尝试计算多个不同组(例如DT[, cor.test(var1, var2), group]
)的两个变量之间的相关性。这在我使用时效果很好,cor.test(var1, var2, method = 'pearson')
但在我使用cor.test(var1, var2, method = 'spearman')
.
library(data.table)
DT <- as.data.table(iris)
# works perfectly
DT[,cor.test(Sepal.Length,Sepal.Width, method = 'pearson'), Species]
# Species statistic parameter p.value estimate null.value
# 1: setosa 7.680738 48 6.709843e-10 0.7425467 0
# 2: setosa 7.680738 48 6.709843e-10 0.7425467 0
# 3: versicolor 4.283887 48 8.771860e-05 0.5259107 0
# 4: versicolor 4.283887 48 8.771860e-05 0.5259107 0
# 5: virginica 3.561892 48 8.434625e-04 0.4572278 0
# 6: virginica 3.561892 48 8.434625e-04 0.4572278 0
# alternative method
# 1: two.sided Pearson's product-moment correlation
# 2: two.sided Pearson's product-moment correlation
# 3: two.sided Pearson's product-moment correlation
# 4: two.sided Pearson's product-moment correlation
# 5: two.sided Pearson's product-moment correlation
# 6: two.sided Pearson's product-moment correlation
# data.name conf.int
# 1: Sepal.Length and Sepal.Width 0.5851391
# 2: Sepal.Length and Sepal.Width 0.8460314
# 3: Sepal.Length and Sepal.Width 0.2900175
# 4: Sepal.Length and Sepal.Width 0.7015599
# 5: Sepal.Length and Sepal.Width 0.2049657
#> 6: Sepal.Length and Sepal.Width 0.6525292
# error
DT[,cor.test(Sepal.Length,Sepal.Width, method = 'spearman'), Species]
# Error in `[.data.table`(DT, , cor.test(Sepal.Length, Sepal.Width, method = "spearman"), :
# Column 2 of j's result for the first group is NULL. We rely on the column types of the first
# result to decide the type expected for the remaining groups (and require consistency). NULL
# columns are acceptable for later groups (and those are replaced with NA of appropriate type
# and recycled) but not for the first. Please use a typed empty vector instead, such as
# integer() or numeric().
问题:
我知道这个特定示例有一些变通方法,但是可以data.table
事先知道在任何情况下使用的列类型是什么DT[i,j,by = 'something']
?
解决方案
如果您想保留所有列,而不是删除带有 NULL 的列,您可以手动设置“问题”列的类(在这种情况下,给出问题的列是“参数”)。如果该列确实包含某些组但不包含其他组的值,这将比删除 NULL 更可取。
DT[, {
res <- cor.test(Sepal.Length, Sepal.Width, method = 'spearman')
class(res$parameter) <- 'integer'
res
}, Species]
# Species statistic parameter p.value estimate null.value alternative method data.name
#1: setosa 5095.097 NA 2.316710e-10 0.7553375 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width
#2: versicolor 10045.855 NA 1.183863e-04 0.5176060 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width
#3: virginica 11942.793 NA 2.010675e-03 0.4265165 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width
推荐阅读
- jdbc - MariaDB 连接器 J:自动重新连接不适用于基本故障转移
- css - 调整窗口大小时元素相互重叠
- java - 选择/过滤 jTable 中的特定行
- json - 使用 RestTemplate 使用基于 HAL 的 API 时自动填充 _link-ed 成员对象
- c# - 正则表达式通过忽略“已替换单引号”和“开始/结束单引号”来替换单引号
- varnish - 如何在编写自定义清漆模块时记录错误?
- postgresql-9.6 - 如何在 pgadmin 4 的表中添加检查约束?
- ios - UISplitViewController 内的 UITabBarController 内的 UINavigationController (仍然)以模态方式显示详细控制器,而不是推送
- android - 如何在画布上绘制多个位图,两者之间有延迟
- python - 在 Python 中执行稀疏矩阵乘法的最快方法