r - mlr - 如何查看影响目标变量的方向(正面或负面)特征
问题描述
让我们从一个简单的线性回归输出(从这里复制)开始,
Call:
lm(formula = a1 ~ ., data = clean.algae[, 1:12])
Residuals:
Min 1Q Median 3Q Max
-37.679 -11.893 -2.567 7.410 62.190
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.942055 24.010879 1.788 0.07537 .
seasonspring 3.726978 4.137741 0.901 0.36892
seasonsummer 0.747597 4.020711 0.186 0.85270
seasonwinter 3.692955 3.865391 0.955 0.34065
sizemedium 3.263728 3.802051 0.858 0.39179
sizesmall 9.682140 4.179971 2.316 0.02166 *
speedlow 3.922084 4.706315 0.833 0.40573
speedmedium 0.246764 3.241874 0.076 0.93941
mxPH -3.589118 2.703528 -1.328 0.18598
mnO2 1.052636 0.705018 1.493 0.13715
Cl -0.040172 0.033661 -1.193 0.23426
NO3 -1.511235 0.551339 -2.741 0.00674 **
NH4 0.001634 0.001003 1.628 0.10516
oPO4 -0.005435 0.039884 -0.136 0.89177
PO4 -0.052241 0.030755 -1.699 0.09109 .
Chla -0.088022 0.079998 -1.100 0.27265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.65 on 182 degrees of freedom
Multiple R-squared: 0.3731, Adjusted R-squared: 0.3215
F-statistic: 7.223 on 15 and 182 DF, p-value: 2.444e-12
从这个输出中,我们可以看到模型适应度以及哪些变量对目标变量有显着影响。此外,我们可以通过查看系数的符号来了解变量是负面影响还是正面影响。
现在看一下包装mlr
手册中的这个例子,
### Select features
sfeats = selectFeatures(learner = "surv.coxph", task = wpbc.task, resampling = rdesc,
control = ctrl, show.info = FALSE)
sfeats
## FeatSel result:
## Features (14): mean_radius, mean_compactness, mean_concavepoints, mean_symmetry, mean_fractaldim, SE_perimeter, SE_area, SE_concavity, SE_fractaldim, worst_radius, worst_perimeter, worst_concavity, worst_concavepoints, tsize
## cindex.test.mean=0.6718346
从上面的输出中,我可以看到重要的功能列表。我的问题是,我怎样才能看到方向(正面或负面)特征(自变量)正在影响目标变量?有人在这里帮助解决这个问题吗?推荐阅读材料将不胜感激。
添加
我正在尝试在示例多标签分类yeast.task
示例中实施您的建议,
library(mlr)
library(mmpf)
yeast <- getTaskData(yeast.task)
labels <- colnames(yeast)[1:14]
yeast.task <- makeMultilabelTask(id = "multi", data = yeast, target = labels)
lrn.br <- makeLearner("classif.rpart", predict.type = "prob")
lrn.br <- makeMultilabelBinaryRelevanceWrapper(lrn.br)
mod <- mlr::train(lrn.br, yeast.task, subset = 1:1500, weights = rep(1/1500, 1500))
pred <- predict(mod, newdata = yeast[1501:1600,])
performance(pred, measures = list(multilabel.subset01, multilabel.hamloss, multilabel.acc,
multilabel.f1, timepredict))
rdesc <- makeResampleDesc(method = "CV", stratify = FALSE, iters = 3)
r <- resample(learner = lrn.br, task = yeast.task, resampling = rdesc, show.info = FALSE)
getMultilabelBinaryPerformances(pred, measures = list(acc, mmce, auc))
getMultilabelBinaryPerformances(r$pred, measures = list(acc, mmce))
getLearnerModel(mod)
pd <- generatePartialDependenceData(mod, yeast.task)
plotPartialDependence(pd)
最后三 3 行给了我以下输出。我不确定这些是否有用。知道我做错了什么吗?
> getLearnerModel(mod)
$label1
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label1; obs = 1500; features = 103
Hyperparameters: xval=0
$label2
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label2; obs = 1500; features = 103
Hyperparameters: xval=0
$label3
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label3; obs = 1500; features = 103
Hyperparameters: xval=0
$label4
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label4; obs = 1500; features = 103
Hyperparameters: xval=0
$label5
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label5; obs = 1500; features = 103
Hyperparameters: xval=0
$label6
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label6; obs = 1500; features = 103
Hyperparameters: xval=0
$label7
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label7; obs = 1500; features = 103
Hyperparameters: xval=0
$label8
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label8; obs = 1500; features = 103
Hyperparameters: xval=0
$label9
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label9; obs = 1500; features = 103
Hyperparameters: xval=0
$label10
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label10; obs = 1500; features = 103
Hyperparameters: xval=0
$label11
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label11; obs = 1500; features = 103
Hyperparameters: xval=0
$label12
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label12; obs = 1500; features = 103
Hyperparameters: xval=0
$label13
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label13; obs = 1500; features = 103
Hyperparameters: xval=0
$label14
Model for learner.id=classif.rpart; learner.class=classif.rpart
Trained on: task.id = label14; obs = 1500; features = 103
Hyperparameters: xval=0
>
> pd <- generatePartialDependenceData(mod, yeast.task)
Error in data.table(preds, design[, vars, drop = FALSE], key = vars) :
column or argument 1 is NULL
> plotPartialDependence(pd)
Error in checkClass(x, classes, ordered, null.ok) : object 'pd' not found
解决方案
如果要提取底层学习器模型的系数,则必须getLearnerModel()
在 mlr 中使用:
library(mlr)
mod = train(learner = "surv.coxph", task = lung.task)
getLearnerModel(mod)
输出:
Call:
survival::coxph(formula = f, data = data)
coef exp(coef) se(coef) z p
inst -3.04e-02 9.70e-01 1.31e-02 -2.31 0.02062
age 1.28e-02 1.01e+00 1.19e-02 1.07 0.28340
sex -5.67e-01 5.67e-01 2.01e-01 -2.81 0.00489
ph.ecog 9.07e-01 2.48e+00 2.39e-01 3.80 0.00014
ph.karno 2.66e-02 1.03e+00 1.16e-02 2.29 0.02223
pat.karno -1.09e-02 9.89e-01 8.14e-03 -1.34 0.18016
meal.cal 2.60e-06 1.00e+00 2.68e-04 0.01 0.99224
wt.loss -1.67e-02 9.83e-01 7.91e-03 -2.11 0.03465
Likelihood ratio test=33.7 on 8 df, p=5e-05
n= 167, number of events= 120
如果您对独立于学习者的交互性感兴趣,那么您可以查看部分依赖图。因为coxph
它们并不是出人意料的线性:
pd = generatePartialDependenceData(mod, lung.task)
plotPartialDependence(pd)
但您也可以使用随机森林来获取生存数据:
mod2 = train(learner = "surv.randomForestSRC", task = lung.task)
pd2 = generatePartialDependenceData(mod2, lung.task)
plotPartialDependence(pd2)
然而,部分依赖图也必须小心解释。所以你应该阅读它,例如here。你也可以看看 ICE 图。
推荐阅读
- react-native - 创建我的项目的构建但失败。反应原生博览会
- angular - 将 Angular 7 Universal 与 Openlayers 5 一起使用
- c++ - 如何在 vtk 中的 3D 表面渲染输出中获取 2D dicom 图像切片的位置
- python - 使用单应投影计算轨迹
- api - IBM Domino 10 - 通过 Domino Data Services API 与 Resource Reservations 集成
- ios - 如何从 iTunesConnect 永久删除应用程序,以便以后无法恢复?
- eloquent - Laravel 获取属于用户的表鬃毛,其中列急切加载
- tensorflow - 使用 load_model 时错误调用 keras 内核初始化程序
- android - 允许用户每天标记出勤一次
- kubernetes - kubernetes 在 nfs 上创建空目录