首页 > 解决方案 > 错误:y 中的观察数不等于 x 的行数

问题描述

这是我的代码:

ames_train_x <- model.matrix(Value ~ ., train)[, -1]
ames_train_y <- log(train$Value)

ames_test_x <- model.matrix(Value ~ ., test)[, -1]
ames_test_y <- log(test$Value)

# Applying LASSO REGRESSION to data

ames_lasso <- glmnet(
  x = ames_train_x,
  y = ames_train_y,
  alpha = 1
)

我收到以下错误:

glmnet(x = ames_train_x, y = ames_train_y, alpha = 1) 中的错误:y (3528) 中的观察数不等于 x (3527) 的行数

我究竟做错了什么?

标签: rregressionglmnet

解决方案


很可能您在 train 中有 NA 值,model.matrix 会抛出带有 NA 的行,请参见下面的 mtcars 示例:

library(glmnet)
df <- mtcars
train_x <- model.matrix(mpg ~ ., df)[, -1]
dim(train_x)
1] 32 10
train_y <- log(df$mpg)
fit = glmnet(y=train_y,x=train_x)

# now we set 1 value to be NA
df["Fiat 128","cyl"]<-NA
train_x <- model.matrix(mpg ~ ., df)[, -1]
Fiat 128" %in% rownames(train_x)
[1] FALSE
dim(train_x)
1] 31 10

拟合这个,会给你你看到的错误:

fit = glmnet(y=train_y,x=train_x)
Error in glmnet(y = train_y, x = train_x) : 
  number of observations in y (32) not equal to the number of rows of x (31)

推荐阅读