r - R:为什么 gbm 在泰坦尼克号数据上给出 NA 值?
问题描述
我有经典的泰坦尼克号数据。以下是已清理数据的说明。
> str(titanic)
'data.frame': 887 obs. of 7 variables:
$ Survived : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 27 54 2 27 14 ...
$ Siblings.Spouses.Aboard: int 1 1 0 1 0 0 0 3 0 1 ...
$ Parents.Children.Aboard: int 0 0 0 0 0 0 0 1 2 0 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
我首先拆分数据。
set.seed(123)
train_ind <- sample(seq_len(nrow(titanic)), size = smp_size)
train <- titanic[train_ind, ]
test <- titanic[-train_ind, ]
然后我将 Survived 列更改为 0 和 1。
train$Survived <- as.factor(ifelse(train$Survived == 'Yes', 1, 0))
test$Survived <- as.factor(ifelse(test$Survived == 'Yes', 1, 0))
最后,我运行了梯度提升算法。
dt_gb <- gbm(Survived ~ ., data = train)
这是结果。
> print(dt_gb)
gbm(formula = Survived ~ ., data = train)
A gradient boosted model with bernoulli loss function.
100 iterations were performed.
There were 6 predictors of which 0 had non-zero influence.
由于有 0 个预测变量具有非零影响,因此预测结果为 NA。我想知道为什么会这样?我的代码有什么问题吗?
解决方案
Survival
避免在训练和测试数据中转换为 0/1 因子。相反,将Survival
列更改为具有numeric
类型的 0/1 向量。
# e.g. like this
titanic$Survival <- as.numeric(titantic$Survival) - 1
# data should look like this
> str(titanic)
'data.frame': 887 obs. of 7 variables:
$ Survived : num 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 27 54 2 27 14 ...
$ Siblings.Spouses.Aboard: int 1 1 0 1 0 0 0 3 0 1 ...
$ Parents.Children.Aboard: int 0 0 0 0 0 0 0 1 2 0 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
然后用伯努利损失拟合模型。
dt_gb <- gbm::gbm(formula = Survived ~ ., data = titanic,
distribution = "bernoulli")
> print(dt_gb)
gbm::gbm(formula = Survived ~ ., distribution = "bernoulli",
data = titanic)
A gradient boosted model with bernoulli loss function.
100 iterations were performed.
There were 6 predictors of which 6 had non-zero influence.
获得前几名乘客的预测生存概率:
>head(predict(dt_gb, type = "response"))
[1] 0.1200703 0.9024225 0.5875393 0.9271306 0.1200703 0.1200703
推荐阅读
- android - CameraX 中的对齐线
- php - 注册后,我的登录页面没有验证用户名和密码
- oracle - Oracle 错误:值大于该列允许的指定精度
- bpf - BPF crc32 奇怪的错误:最后一个 insn 不是退出或跳转
- node.js - Nodejs API 在 localhost 中回复 JSON,但相同的 API 调用,在部署 heroku 时回复空 JSON
- firebase-dynamic-links - Firebase 动态链接可以用于通用应用邀请和特定页面邀请以跟踪推荐吗?
- c# - 使用范围运算符给出错误预定义类型“System.range”未定义或导入
- ethereum - 使用数组中数百个预填充结构创建 Solidity 合约的 Gas 问题
- python - 如何导出 csv 或文本文件中的所有链接?
- c# - 将项目转换为 NETSDK 格式后表单中的资源异常