首页 > 解决方案 > 如何使用特异性和敏感性度量之和作为 R 插入符号中训练的汇总度量?

问题描述

我在 R 中对 xgbtree 使用插入符号:

fitControl_2 <- trainControl(## 3-fold CV
  method = "repeatedcv",
  number = 3,
  repeats = 2,
  verboseIter = TRUE,
  )

xgboost <- train(interest_factor ~ .,
                       data = train_set_balanced,
                       method = "xgbTree",
                 trControl = fitControl_2,
                 ## Specify which metric to optimize
                 metric = "Kappa")

有没有办法使用灵敏度+特异性或约登指数作为指标而不是 Kappa?我知道您可以使用自定义函数,但不清楚在这种情况下如何正确构建一个。

标签: rxgboostr-caret

解决方案


这是一个汇总函数,它将使用 Sens + Spec 的总和作为选择指标:

youdenSumary <- function(data, lev = NULL, model = NULL){
  if (length(lev) > 2) {
    stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
  }
  if (!all(levels(data[, "pred"]) == lev)) {
    stop("levels of observed and predicted data do not match")
  }
  Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1]) 
  Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
  j <- Sens + Spec
  out <- c(j, Spec, Sens)
  names(out) <- c("j", "Spec", "Sens")
 out
}

要理解为什么这样定义它,请阅读插入符号书中的这一。一些可能对 SO 有帮助的答案是:

插入符号包中使用预测概率的自定义性能函数

插入符号中的其他指标 - PPV、敏感性、特异性

例子:

library(caret)
library(mlbench)
data(Sonar)

fitControl <- trainControl(method = "cv",
                           number = 5,
                           summaryFunction = youdenSumary)
fit <-  train(Class ~.,
              data = Sonar,
              method = "rpart", 
              metric = "j" ,
              tuneLength = 5,
              trControl = fitControl)

fit
#output
CART 

208 samples
 60 predictor
  2 classes: 'M', 'R' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 167, 166, 166, 166, 167 
Resampling results across tuning parameters:

  cp          j         Spec       Sens     
  0.00000000  1.394980  0.6100000  0.7849802
  0.01030928  1.394980  0.6100000  0.7849802
  0.05154639  1.387708  0.6300000  0.7577075
  0.06701031  1.398629  0.6405263  0.7581028
  0.48453608  1.215457  0.3684211  0.8470356

j was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.06701031.

推荐阅读