首页 > 解决方案 > 在 RHS 上运行具有许多项的线性回归的致命错误

问题描述

运行时lm,RStudio 给我一个致命错误并重新启动我的会话。我的源数据有几个因子列,每个列被分解为数百个虚拟变量。下面ld.vars,提取线性相关虚拟变量,有 363 个条目。所以,我的整体回归方程在 RHS 上有数百个项;所有列,虚拟变量,然后减去线性相关虚拟变量:

y =(所有 x 变量,包括自动生成的每个虚拟变量)-(363 个线性相关虚拟变量)

RHS 的长度是我致命错误的根源吗?

我的尝试如下,使用来自此 SO 解决方案的代码。

代码

library(car)

load(‘full_data.rda’)

## build original regression with all dummy variables included
formula <- as.formula(paste0("Closing.Cost ~ ", paste(colnames(full_data[-19]), collapse=' + ')))
reg3 <- lm(formula, full_data)

## this line produces a warning: "prediction from a rank-deficient fit may be misleading"
predict(reg3, newdata=full_data[106,], interval="prediction")

## this line produces an error: "Error in vif.default(reg3) : there are aliased coefficients in the model"
vif(reg3)

## find the linearly dependent variables
ld.vars <- attributes(alias(reg3)$Complete)$dimnames[[1]]
ld.vars <- paste0("`", ld.vars, "`")

## remove the linearly dependent variables
formula.new <- as.formula(paste0(formula, " - ", paste0(ld.vars, collapse = " - "), collapse = " - "))

## run new model: this line produces a fatal error
reg4 <-lm(formula.new, full_data)

## assess collinearity of new regression (haven't been able to run this line)
vif(reg4)
> matrix <- model.matrix(formula, full_data)
> dim(matrix)
[1] 4179 1311

我有数百个级别的几个因素

> total_levels <- full_data %>% purrr::map(levels) %>% map(length)
> Reduce("+", total_levels)
[1] 1271

从评论中的@BenBolker,我lm在 RHS 上尝试了一个包含大量术语的概括(2582 列,足以等于我在上面的示例中尝试的术语数量)。这很好用:

dd <- as.data.frame(matrix(rnorm(4179*2582),ncol=2582)); m1 <- lm(V1 ~ ., data=dd)

致命错误 弹出

会话信息(尽管这是在运行引发致命错误的代码块之前):

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] car_3.0-2         carData_3.0-2     rattle_5.2.0      rpart_4.1-13      caret_6.0-81      lattice_0.20-38   DT_0.5           
 [8] plotly_4.8.0      generics_0.0.2    broom_0.5.1       lubridate_1.7.4   janitor_1.1.1     ggrepel_0.8.0     rio_0.5.16       
[15] data.table_1.12.0 forcats_0.4.0     stringr_1.4.0     dplyr_0.8.0.1     purrr_0.3.1       readr_1.3.1       tidyr_0.8.3      
[22] tibble_2.0.1      ggplot2_3.1.0     tidyverse_1.2.1   kableExtra_1.0.1 

loaded via a namespace (and not attached):
 [1] nlme_3.1-137       RColorBrewer_1.1-2 webshot_0.5.1      httr_1.4.0         tools_3.5.2        backports_1.1.3    R6_2.4.0          
 [8] lazyeval_0.2.1     colorspace_1.4-0   nnet_7.3-12        withr_2.1.2        tidyselect_0.2.5   curl_3.3           compiler_3.5.2    
[15] cli_1.0.1          rvest_0.3.2        xml2_1.2.0         bookdown_0.9       scales_1.0.0       digest_0.6.18      foreign_0.8-71    
[22] rmarkdown_1.11     pkgconfig_2.0.2    htmltools_0.3.6    htmlwidgets_1.3    rlang_0.3.1        readxl_1.3.0       rstudioapi_0.9.0  
[29] shiny_1.2.0        jsonlite_1.6       crosstalk_1.0.0    ModelMetrics_1.2.2 zip_2.0.0          magrittr_1.5       Matrix_1.2-15     
[36] Rcpp_1.0.0         munsell_0.5.0      abind_1.4-5        stringi_1.3.1      yaml_2.2.0         MASS_7.3-51.1      plyr_1.8.4        
[43] recipes_0.1.4      grid_3.5.2         promises_1.0.1     crayon_1.3.4       haven_2.1.0        splines_3.5.2      hms_0.4.2         
[50] knitr_1.22         pillar_1.3.1       reshape2_1.4.3     codetools_0.2-15   stats4_3.5.2       glue_1.3.0         evaluate_0.13     
[57] blogdown_0.11      rpart.plot_3.0.6   modelr_0.1.4       httpuv_1.4.5.1     foreach_1.4.4      cellranger_1.1.0   gtable_0.2.0      
[64] assertthat_0.2.0   xfun_0.5           gower_0.2.0        openxlsx_4.1.0     mime_0.6           prodlim_2018.04.18 xtable_1.8-3      
[71] later_0.8.0        class_7.3-14       survival_2.43-3    viridisLite_0.3.0  timeDate_3043.102  iterators_1.0.10   lava_1.6.5        
[78] ipred_0.9-8 

标签: rlm

解决方案


推荐阅读