r - 如何根据不同的指标使用workflow_set(tidymodels)选择多个模型
问题描述
我正确运行了以下模型,我需要选择最好的两个(用于一个或多个指标)。模型之间的区别在于配方对象对不平衡数据采取不同的步骤(没有,smote,rose,upsample,step_adasyn)。我对选择多个、最好的两个以及通过不平衡函数进行选择很感兴趣。
yardstick::sensitivity, yardstick::specificity,
yardstick::precision, yardstick::recall )
folds <- vfold_cv(data_train, v = 3, strata = class)
rec_obj_all <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors())
rec_obj_all_s <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_smote(class)
rec_obj_all_r <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_rose(class)
rec_obj_all_up <- data_train %>%
recipe(clas ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_upsample(class)
rec_obj_all_ad <- data_train %>%
recipe(class ~ .) %>%
step_naomit(everything(), skip = TRUE) %>%
step_zv(all_numeric(), -all_outcomes()) %>%
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors()) %>%
step_adasyn(class)
lasso_mod1 <- logistic_reg(penalty = tune(),
mixture = 1) %>%
set_engine("glmnet")
tictoc::tic()
all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)
balances <-
workflow_set(
preproc = list(unba = rec_obj_all, b_sm = rec_obj_all_s, b_ro = rec_obj_all_r,
b_up = rec_obj_all_up, b_ad = rec_obj_all_ad),
models = list(lasso_mod1),
cross = TRUE
)
grid_ctrl <-
control_grid(
save_pred = TRUE,
parallel_over = "everything",
save_workflow = FALSE
)
grid_results <-
balances %>%
workflow_map(
seed = 1503,
resamples = folds,
grid = 25,
metrics = metrics_lasso,
control = grid_ctrl,
verbose = TRUE)
parallel::stopCluster( cl )
tictoc::toc()```
I don´t understand what is the correspond function to select the best two or more models with the package workflowsets.
解决方案
工作流集中有方便的功能来对结果进行排名并提取最佳结果,但如果您有更具体的用例,如您在此处描述的(最好的两个,或者基于更复杂的过滤的最佳用例),那么继续使用tidyr + dplyr动词来处理你的结果在grid_results
. 您可以unnest()
和/或使用 的结果rank_results()
来获取您感兴趣的内容。
推荐阅读
- sql - 如何为 SQL 表自动生成数据类型
- python - 关于余弦相似度,损失函数和网络如何选择(我有两个方案)
- node.js - cannot upload csv file on nodejs server
- python - 我已经设法在我的 Debian 系统上做了一些事情,而 Python-Cryptography 不允许我安装或删除任何东西
- java - 在java中动态添加类对象到ArrayList
- ruby-on-rails - 带有全局安装 gem 的 rails rake 任务
- python-3.x - 当脚本显示有货时,如何让脚本发送一个通知?
- swift - 初始化泛型类型的占位符空变量
- gcc - ARM 处理器的 GNU 工具链是否提供对 ARM11 等经典处理器的支持?
- c# - 如何从进程名称中获取进程 ID?(C#)