r - R中的简单决策树-插入符号包的奇怪结果
问题描述
我正在尝试使用 caret 包将一个简单的决策树应用于以下数据集,数据为:
> library(caret)
> mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
> mydata$rank <- factor(mydata$rank)
# create dummy variables
> X = predict(dummyVars(~ ., data=mydata), mydata)
> head(X)
A matrix: 6 × 7 of type dbl
admit gre gpa rank.1 rank.2 rank.3 rank.4
0 380 3.61 0 0 1 0
1 660 3.67 0 0 1 0
1 800 4.00 1 0 0 0
1 640 3.19 0 0 0 1
0 520 2.93 0 0 0 1
1 760 3.00 0 1 0 0
拆分为训练和测试集:
> trainset <- data.frame(X[1:300,])
> testset <- data.frame(X[301:400,])
现在应用决策树:
> tree <- train(factor(admit) ~., data = trainset, method = "rpart")
> tree
CART
300 samples
6 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 300, 300, 300, 300, 300, 300, ...
Resampling results across tuning parameters:
cp Accuracy Kappa
0.01956522 0.6856163 0.1865179
0.03260870 0.6888378 0.1684015
0.08695652 0.7080434 0.1079462
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.08695652.
我得到NaN
了可变的重要性!为什么?
> varImp(tree)$importance
A data.frame: 6 × 1 Overall
<dbl>
gre NaN
gpa NaN
rank.1 NaN
rank.2 NaN
rank.3 NaN
rank.4 NaN
而在预测中,决策树只输出一个类,即 0 类,为什么?我的代码有什么问题?提前致谢。
> y_pred <- predict(tree ,newdata=testset)
> y_test <- factor(testset$admit)
> confusionMatrix(y_pred, factor(y_test))
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 65 35
1 0 0
Accuracy : 0.65
95% CI : (0.5482, 0.7427)
No Information Rate : 0.65
P-Value [Acc > NIR] : 0.5458
Kappa : 0
Mcnemar's Test P-Value : 9.081e-09
Sensitivity : 1.00
Specificity : 0.00
Pos Pred Value : 0.65
Neg Pred Value : NaN
Prevalence : 0.65
Detection Rate : 0.65
Detection Prevalence : 1.00
Balanced Accuracy : 0.50
'Positive' Class : 0
解决方案
我无法回答您的问题,但我可以向您展示我用来计算决策树的方式:
library(data.table)
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)
# Reading data into data.table
mydata <- fread("https://stats.idre.ucla.edu/stat/data/binary.csv")
# converting rank and admit to factors
mydata$rank <- as.factor(mydata$rank)
mydata$admit <- as.factor(mydata$admit)
# creating train and test data
t_index <- createDataPartition(mydata$admit, p=0.75, list=FALSE)
trainset <- mydata[t_index,]
testset <- mydata[-t_index,]
# calculating the model using rpart
model <- rpart(admit ~ .,
data = trainset,
parms = list(split="information"),
method = "class")
# plotting the decision tree
model %>%
rpart.plot(digits = 4)
# get confusion matrix
model %>%
predict(testset, type = "class") %>%
table(testset$admit) %>%
confusionMatrix()
也许这对您有所帮助。
推荐阅读
- regex - 捕获字符串中的接口,然后将其存储在变量中
- html - Bootstrap 4 输入调整大小行为
- uipath - UIPath 执行工作流列表
- typescript - 打字稿不断在导入上方添加一个函数
- cygwin - cygwin /dev/input/eventX 功能
- grid - 将点移动到规则网格
- java - 使用 java 代码配置创建管理 MongoDb 连接的 Jpa EntityManager
- bash - 如何通过在 ubuntu 中包含“cd”命令来运行 shell 脚本?
- python - pandas csv_read 浮点数逗号
- r - 如何从字符串中提取数字?