首页 > 解决方案 > 使用 train() 时出错:选择了未定义的列

问题描述

我查看了有关此问题的其他已发布问题,但未能成功使我的代码正常工作。

这是我的代码:

library(titanic)  
library(caret)
library(tidyverse)
library(rpart)

# 3 significant digits
options(digits = 3)

# clean the data
titanic_clean <- titanic_train %>%
  mutate(Survived = factor(Survived),
         Embarked = factor(Embarked),
         Age = ifelse(is.na(Age), median(Age, na.rm = TRUE), Age), # NA age to median age
         FamilySize = SibSp + Parch + 1) %>%    # count family members
  select(Survived,  Sex, Pclass, Age, Fare, SibSp, Parch, FamilySize, Embarked)

index <- createDataPartition(titanic_clean$Survived, times = 1, p = 0.2, list = FALSE)
test_set <- titanic_clean[index, ]
train_set <- titanic_clean[-index, ]

caret::train(train_set$Survived ~ train_set$Fare, method="glm", data=train_set)

train 函数返回以下错误:

Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected

有任何想法吗?

标签: rr-carettraining-data

解决方案


推荐阅读