首页 > 解决方案 > R返回错误:下标超出范围

问题描述

我一直在研究 R ISLR College 数据集,我想在训练集上执行最佳子集选择,并绘制与每种尺寸的最佳模型相关的训练集 MSE。

library(ISLR)
library(leaps)
data(College)
head(College)

#splitting the data into 70/30
subset<- sample(nrow(college)*0.7)
collegetrain<- college[subset,]
collegetest<-college[-subset,]

这是我的代码:

regfit.full <- regsubsets(apps ~ ., data = college.train, nvmax = 20)
train.mat <- model.matrix(apps ~ ., data = college.train, nvmax = 20)
val.errors <- rep(NA, 20)
for (i in 1:20) {
coefi <- coef(regfit.full, id = i)
pred <- train.mat[, names(coefi)] %*% coefi
val.errors[i] <- mean((pred - college.train$y)^2)
}
plot(val.errors, xlab = "Number of predictors", ylab = "Training MSE", pch = 19, type = "b")

数据集的结构如下:777 个观察值,其中543 个在训练集中,234 个在测试集中。有 18 个变量,其中 17 个是数字,1 个是是和否的因子(不需要更改)。

运行代码时收到 的错误消息是: s$which [id, , drop=FALSE] 中的错误:下标越界

标签: rplotstatisticsmse

解决方案


regfit.full <- regsubsets(Apps ~ ., data = collegetrain, nvmax = 20)
train.mat <- model.matrix(Apps ~ ., data = collegetrain, nvmax = 20)

val.errors <- rep(NA, 20)
for (i in 1:17) {
  coefi <- coef(regfit.full, id = i)
  pred <- train.mat[, names(coefi)] %*% coefi
  val.errors[i] <- mean((pred - collegetrain$Apps)^2)
}

plot(val.errors, xlab = "Number of predictors", ylab = "Training MSE", 
     pch = 19, type = "b")

推荐阅读