cross-validation - 如何在mlr中联合使用makeFeatSelWrapper和resample函数
问题描述
我正在使用 R 中的 MLR 包为二进制问题拟合分类模型。对于每个模型,我使用“selectFeatures”函数对嵌入式特征选择执行交叉验证。在输出中,我检索了测试集和预测的平均 AUC。为此,在获得一些建议(在 MLR 中获取测试集的预测)后,我将“makeFeatSelWrapper”函数与“resample”函数结合使用。目标似乎达到了,但结果却很奇怪。使用逻辑回归作为分类器,我得到的 AUC 为 0.5,这意味着没有选择任何变量。这个结果是出乎意料的,因为我使用链接问题中提到的方法使用此分类器获得了 0.9824432 的 AUC。使用神经网络作为分类器,我收到一条错误消息
sum(x) 中的错误:参数的“类型”(列表)无效
怎么了?
这是示例代码:
# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################
install.packages("mlbench")
library(mlbench)
data(BreastCancer)
# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
p<-mlbench.waveform(1000)
# convert list into dataframe
dataset<-as.data.frame(p)
# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
# 2. Perform cross validation with embedded feature selection using logistic regression
#######################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.logreg", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of hold-out sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
# 3. Perform cross validation with embedded feature selection using neural network
##################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
解决方案
如果您多次运行代码的逻辑回归部分,您也应该得到Error in sum(x) : invalid 'type' (list) of argument
错误。但是,我觉得奇怪的是,set.seed(1)
在重新采样之前修复特定的种子(例如,)并不能确保错误出现或不出现。
该错误发生在mlr
用于将功能选择的输出打印到控制台的内部代码中。一个非常简单的解决方法是简单地避免使用show.info = FALSE
in打印此类输出makeFeatSelWrapper
(请参见下面的代码)。虽然这消除了错误,但导致它的原因可能会产生其他后果,尽管错误可能只影响打印代码。
运行您的代码时,我的 AUC 仅高于 0.90。请在下面找到您的逻辑回归代码,稍微重新组织并使用解决方法。我已将 droplevels() 添加到 dataset2 以从因子中删除缺少的级别 3,但这与解决方法无关。
library(mlbench)
library(mlr)
data(BreastCancer)
p<-mlbench.waveform(1000)
dataset<-as.data.frame(p)
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 )
mCT <- makeClassifTask(data =dataset2, target = "classes")
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mL <- makeLearner("classif.logreg", predict.type = "prob")
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl, show.info = FALSE)
# uncomment this for the error to appear again. Might need to run the code a couple of times to see the error
# lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
推荐阅读
- python - 我需要编辑一些“默认参数”而不是全部,我该怎么做?
- backend - 如何识别应用程序是否使用支持 Win32/UIA 以在具有正确后端设置的 pywinauto 中启动相同的应用程序。?
- c# - 如何使用条形码创建图像
- c# - 根据过滤条件从集合内的集合中获取对象
- xmpp - ejabberd 聚类问题及解决方案
- html - 如何从容器中设置双 bg 颜色
- openshift - 如何通过 Openshift 中的 CLI 删除持久卷中的文件
- php - 使用一个查询再次从同一行中选择值
- python - 在熊猫中如何将行变成列并在之后分配行的值?
- java - 无法制作可比较的实现类,可序列化