首页 > 解决方案 > 如何根据不同的指标使用workflow_set(tidymodels)选择多个模型

问题描述

我正确运行了以下模型,我需要选择最好的两个(用于一个或多个指标)。模型之间的区别在于配方对象对不平衡数据采取不同的步骤(没有,smote,rose,upsample,step_adasyn)。我对选择多个、最好的两个以及通过不平衡函数进行选择很感兴趣。

                      yardstick::sensitivity, yardstick::specificity, 
                      yardstick::precision, yardstick::recall )
folds <- vfold_cv(data_train, v = 3, strata = class)

rec_obj_all <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) 

rec_obj_all_s <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_smote(class)

rec_obj_all_r <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors())  %>%
  step_rose(class)

rec_obj_all_up <- data_train %>% 
  recipe(clas ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_upsample(class)

rec_obj_all_ad <- data_train %>% 
  recipe(class ~ .) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_adasyn(class)

lasso_mod1 <- logistic_reg(penalty = tune(),
                          mixture = 1) %>%
  set_engine("glmnet")

tictoc::tic()

all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)

balances <- 
  workflow_set(
    preproc = list(unba = rec_obj_all, b_sm = rec_obj_all_s, b_ro = rec_obj_all_r,
                   b_up = rec_obj_all_up, b_ad = rec_obj_all_ad), 
    models = list(lasso_mod1),
    cross = TRUE
  )

grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = FALSE
  )

grid_results <-
  balances %>%
  workflow_map(
    seed = 1503,
    resamples = folds,
    grid = 25,
    metrics = metrics_lasso,
    control = grid_ctrl,
    verbose = TRUE)
    

parallel::stopCluster( cl )

tictoc::toc()```

I don´t understand what is the correspond function to select the best two or more models with the package workflowsets.

标签: rclassificationworkflowglmnettidymodels

解决方案


工作流集中有方便的功能来对结果进行排名并提取最佳结果,但如果您有更具体的用例,如您在此处描述的(最好的两个,或者基于更复杂的过滤的最佳用例),那么继续使用 + 动词来处理你的结果在grid_results. 您可以unnest()和/或使用 的结果rank_results()来获取您感兴趣的内容。


推荐阅读