首页 > 解决方案 > 当我尝试使用随机森林模型进行预测时,为什么会出现错误?

问题描述

我正在尝试使用随机森林模型进行预测

我的数据如下所示:

> str(margins_data)
'data.frame':   457961 obs. of  10 variables:
$ month           : Factor w/ 7 levels "April","August",..: 6 6 4 6 2 1 5 6 
5 4 ...
$ miles           : num  416 1559 1156 672 1188 ...
$ equipment       : Factor w/ 3 levels "Flat","Reefer",..: 1 3 3 3 3 2 3 2 3 
3 ...
$ originstate     : Factor w/ 62 levels "  ","AB","AL",..: 20 55 14 34 14 56 
14 34 57 14 ...
$ destinationstate: Factor w/ 62 levels "AB","AK","AL",..: 17 7 55 27 55 8 
55 32 46 12 ...
$ margin          : num  800 450 450 200 450 700 500 375 200 200 ...
$ ldi             : num  2.5 4.84 3.1 1.75 3.35 ...
$ weight          : int  40000 43000 40000 10000 39000 35000 39000 7817 
38000 42720 ...
$ commoditygroup  : Factor w/ 49 levels "Agriculture",..: 18 9 18 15 42 38 
18 22 27 18 ...
$ customerindustry: Factor w/ 352 levels "Abrasive, Asbestos, And 
Miscellaneous",..: 300 336 336 229 336 133 336 133 260 264 ...
- attr(*, "na.action")= 'omit' Named int  1182 2282 2869 2999 3082 4609 5360 
5444 5445 6029 ...
..- attr(*, "names")= chr  "1182" "2282" "2869" "2999" ...

我将数据分成训练集和测试集:

N <- nrow(margins_data)

target <- round(N * 0.75)

gp <- runif(N)

margin_train <- margins_data[gp < 0.75, ]

margin_test <- margins_data[gp >= 0.75, ]

并定义了我的模型参数:

seed <- 423563

outcome <- "margin"

vars <- c("miles", "equipment", "originstate", "destinationstate", "margin", 
"ldi", "weight", "commoditygroup", "customerindustry")

 fmla <- paste(outcome, "~", paste(vars, collapse = " + "))

 margin_model_rf <- ranger(fmla,
              margin_train,
              num.trees = 500,
              respect.unordered.factors = "order",
              seed = seed)


margin_model_rf

Call:
ranger(fmla, margin_train, num.trees = 500, respect.unordered.factors = 
"order",      seed = seed) 

Type:                             Regression 
Number of trees:                  500 
Sample size:                      343253 
Number of independent variables:  9 
Mtry:                             3 
Target node size:                 5 
Variable importance mode:         none 
Splitrule:                        variance 
OOB prediction error (MSE):       840.8202 

当我尝试预测测试数据时,出现以下错误:

margin_predict <- predict(margin_model_rf, margin_test)
Error: Missing data in columns: weight.
In addition: Warning message:
In mapply(function(x, y) { :
longer argument not a multiple of length of shorter

对此的任何帮助将不胜感激。

标签: rmachine-learningrandom-forest

解决方案


推荐阅读