首页 > 解决方案 > `object` 和 `newdata` 中存储的特征名称不同!使用 mlr 包

问题描述

我正在尝试为 XGBoost 创建一个多标签分类模型。我有一个适用于 RF,但是当我尝试下面的 XGBoost 代码时,我得到了错误:

"Error in predict.xgb.Booster(m, newdata = data.matrix(.newdata), ...) : 
Feature names stored in 'object' and 'newdata' are different!

而且由于我没有分配它训练数据和测试数据或“newdata”(这是由 makeResampleDesc() 完成的),我不知道如何解决这个错误。

有任何想法吗?

示例数据

library(mlr)
library(xgboost)
lab <- c("l1","l2","l3","l4","l5")
age <- c(round(rnorm(120,mean = 50,sd = 10)))
sex <- c(round(rnorm(120,mean = 0.5,sd = 0.2)))
l1 <- as.logical(c(round(rnorm(120,mean = 0.5,sd = 0.2))))
l2 <- as.logical(c(round(rnorm(120,mean = 0.5,sd = 0.2))))
l3 <- as.logical(c(round(rnorm(120,mean = 0.5,sd = 0.2))))
l4 <- as.logical(c(round(rnorm(120,mean = 0.5,sd = 0.2))))
l5 <- as.logical(c(round(rnorm(120,mean = 0.5,sd = 0.2))))
data <- as.data.frame(cbind(age,sex,l1,l2,l3,l4,l5))
data[,lab]<- lapply(data[,lab],FUN = as.logical)

创建学习者

learner <- "classif.xgboost"
lrn <- makeLearner(learner, objective = "multi:softprob") 
lrn <- makeMultilabelClassifierChainsWrapper(lrn, order = NULL) 
lrn <- setPredictType(lrn,"prob")

创建网格

ps <- makeParamSet(
  makeDiscreteParam("max_depth", values = c(1,3,5)),
  makeDiscreteParam("eta",values = c(0.001,0.01,0.1))
)

设置重采样方法和调整

ctrl <- makeTuneControlGrid()
rdesc <- makeResampleDesc(method = "CV",iters = 5L)

启动评估向量

v_f1 <- c()
v_max_depth <- c()
v_eta <- c()

实际模型训练

task <- makeMultilabelTask(data = data, target = lab) 
   
res <- tuneParams(lrn,task = task,resampling = rdesc, par.set = ps,
       control = ctrl, measures = multilabel.f1)

v_f1 <- c(v_f1,as.vector(res$y[1]))
v_max_depth <- c(v_max_depth,as.vector(res$x[1]))
v_eta <- c(v_eta,as.vector(res$x[4]))

标签: xgboostmultilabel-classificationmlr

解决方案


推荐阅读