r - ranger 和 classProbs = TRUE 缺少所有准确度值
问题描述
library(dplyr)
library(caret)
library(doParallel)
cl <- makeCluster(3, outfile = '')
registerDoParallel(cl)
set.seed(2019)
fit1 <- train(x = X_train %>% head(1000) %>% as.matrix(),
y = y_train %>% head(1000),
method = 'ranger',
verbose = TRUE,
trControl = trainControl(method = 'oob',
verboseIter = TRUE,
allowParallel = TRUE,
classProbs = TRUE),
tuneGrid = expand.grid(mtry = 2:3,
min.node.size = 1,
splitrule = 'gini'),
num.tree = 100,
metric = 'Accuracy',
importance = 'permutation')
stopCluster(cl)
上面的代码导致错误:
汇总结果 缺少所有准确度指标值:准确度 Kappa
最小值。: NA 分钟。: NA
1st Qu.: NA 1st Qu.: NA
中位数 : NA 中位数 : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : 不适用最大。: NA
NA :2 NA :2
错误: 停止
我已经搜索过这个错误,发现背后有很多可能的原因。不幸的是,我没有找到任何适用于我的案例的东西。在这里,问题似乎出在classProbs = TRUE
- 当我删除它并且使用的模型的默认值FALSE
被成功训练时。但是,根据文档,我不明白为什么它可能是一个问题:
合乎逻辑的;是否应该在每个重采样中为分类模型(连同预测值)计算类概率?
数据样本:
X_train <- structure(list(V5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V1 = c(41.5,
5.3, 44.9, 58.7, 67.9, 56.9, 3.7, 43.4, 38.6, 34.2, 42.3, 29.1,
27.6, 44.2, 55.6, 53.7, 48, 58.4, 54, 7.1, 35.9, 36, 61.2, 24.1,
20.3, 10.8, 13, 69.4, 71.5, 45.6, 34.4, 17.1, 30.1, 68.9, 25.1,
37.4, 55.5, 58.9, 49.8, 47.2, 29.5, 19.9, 24.1, 27, 33.3, 41.9,
33.2, 27.9, 48.4, 41.2), V2 = c(33.1, 35.4, 66.2, 1.8, 5, -0.9,
32.8, 35.8, 36, 4, 65.5, 64, 61, 68.9, 69.3, 59.7, 29.8, 24.4,
62.7, 12.2, 6, -1.2, 63.5, 7.5, 22.9, 40.5, 47.3, 1.6, -1.5,
33.3, 53.3, 23.7, 2.7, 61, 2.4, 13.5, 8.1, 55.1, 29.6, 36.8,
26.8, 26, 30.8, 53.8, 10.6, 1.9, 10.2, 29.1, 51.4, 33.1), V3 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), V4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -50L))
y_train <- structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("plus", "minus"), class = "factor")
解决方案
根据对https://stats.stackexchange.com/questions/23763/is-there-a-way-to-disable-the-parameter-tuning-grid-feature-in-caret的回复,我尝试按照建议将trainControl
“方法”设置为允许成功执行的“无”。第二个答案的答案暗示随机森林方法不应使用复杂的网格。(我还将“mtry”参数设置为单个值,但我不确定那是必要的。)(我之前曾尝试在不对错误产生任何影响的情况下删除并行集群的使用。)现在您可以添加回功能,因为您的代码不会引发错误。
fit1 <- train(form=y~., x = X_train[,2:3] ,
y = factor(y_train) ,
method = 'ranger',
verbose = TRUE,
trControl=trainControl(method="none"),
tuneGrid = expand.grid(mtry = 2,
min.node.size = 1,
splitrule = 'gini'
),
num.tree = 100,
metric = 'Accuracy',
importance = 'permutation')
推荐阅读
- javascript - 当我将某些内容分配给变量时,为什么 undefined 会打印到控制台?
- javascript - 如何根据我的主题 Dark/Light 更改 chatsJs 的刻度颜色和网格线颜色?
- mongodb - 如何在mongo查询中获取子数组的长度
- cassandra - 在正在运行的节点中将 cassandra 快照文件复制到 sstable 文件上是否安全?
- python - 添加 2 个字母变量来创建一个单词并且不能在所述单词上使用 .strip
- reactjs - 无法使用 puppeteer 运行测试
- libgdx - 在运行时更改在构造函数中添加到阶段的表的背景
- java - java.lang.IllegalStateException 如果任何 JUnit 断言失败
- tmux - 如何在macos中将tmux的前缀设置为M-`?
- python - pip 在 Windows 7 上的 python 3.8.10 中不起作用