首页 > 解决方案 > randomForest 中的错误,NA,对象中的缺失值

问题描述

当我尝试

marketing.rf <- randomForest(formula = as.numeric(y) ~., data = marketing.train, importance = TRUE) 

它显示错误:

Error in na.fail.default(list(`as.numeric(y)` = c(NA_real_, NA_real_, : missing values in object

当我尝试时:

y.val <- ifelse(marketing.train$y=="yes", 1,0)
marketing.rf <-  randomForest(formula = as.numeric(y.val) ~., data = marketing.train, importance = TRUE) 

它显示另一个错误:

Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1)

我尝试使用as.factor(y),但它显示了类似的错误。我曾经dput(marketing.test$y)查看过这些值,但根本无法在其中找到任何 NA 或无效值。

我对R很陌生,有人可以帮我解决这个问题吗?谢谢!!!

以下是样本火车数据:

age job             marital     edu         default   balance  housing   loan   y
58  management      married     tertiary    no        2143     yes       no     no
33  entrepreneur    married     secondary   no        2        yes       yes    no
33  unknown         single      unknown     no        1        no        no     no
42  entrepreneur    divorced    tertiary    yes       2        yes       no     no

标签: rrandom-forest

解决方案


这是一个包含 reprex 数据的完整示例。没有你的数据,我无法做出完美的答案,但如果你遵循这个逻辑,你应该没问题。

library(randomForest)

# Generate Some Fake Data
fake_data <- data.frame(
  age = runif(500, 30, 65),
  martial = sample(c("single", "married", "divorced"), 500, T),
  default = sample(c("yes", "no"), 500, T),
  balance = runif(500,0,2100),
  housing = sample(c("yes", "no"), 500, T),
  loan = sample(c("yes", "no"), 500, T),
  stringsAsFactors = FALSE
)

# Add some missing data for example

fake_data[sample(x = 1:500, size = 5), "loan"] <- NA

# Check for NAs

fake_data_2 <- fake_data[!is.na(fake_data$loan),]

cat("You have removed ", nrow(fake_data)-nrow(fake_data_2), " records")

# Add target and make sure it is a factor

fake_data_2$y <- as.factor(fake_data_2$loan)

# Make characters into factors
library(dplyr)

fake_data_2 <- fake_data_2 %>% 
  mutate_if(is.character, as.factor)

fit <- randomForest(y ~ ., data = fake_data_2)

这将产生一个有效的随机森林模型。


推荐阅读