r - 在 RHS 上运行具有许多项的线性回归的致命错误
问题描述
运行时lm
,RStudio 给我一个致命错误并重新启动我的会话。我的源数据有几个因子列,每个列被分解为数百个虚拟变量。下面ld.vars
,提取线性相关虚拟变量,有 363 个条目。所以,我的整体回归方程在 RHS 上有数百个项;所有列,虚拟变量,然后减去线性相关虚拟变量:
y =(所有 x 变量,包括自动生成的每个虚拟变量)-(363 个线性相关虚拟变量)
RHS 的长度是我致命错误的根源吗?
我的尝试如下,使用来自此 SO 解决方案的代码。
代码
library(car)
load(‘full_data.rda’)
## build original regression with all dummy variables included
formula <- as.formula(paste0("Closing.Cost ~ ", paste(colnames(full_data[-19]), collapse=' + ')))
reg3 <- lm(formula, full_data)
## this line produces a warning: "prediction from a rank-deficient fit may be misleading"
predict(reg3, newdata=full_data[106,], interval="prediction")
## this line produces an error: "Error in vif.default(reg3) : there are aliased coefficients in the model"
vif(reg3)
## find the linearly dependent variables
ld.vars <- attributes(alias(reg3)$Complete)$dimnames[[1]]
ld.vars <- paste0("`", ld.vars, "`")
## remove the linearly dependent variables
formula.new <- as.formula(paste0(formula, " - ", paste0(ld.vars, collapse = " - "), collapse = " - "))
## run new model: this line produces a fatal error
reg4 <-lm(formula.new, full_data)
## assess collinearity of new regression (haven't been able to run this line)
vif(reg4)
> matrix <- model.matrix(formula, full_data)
> dim(matrix)
[1] 4179 1311
我有数百个级别的几个因素
> total_levels <- full_data %>% purrr::map(levels) %>% map(length)
> Reduce("+", total_levels)
[1] 1271
从评论中的@BenBolker,我lm
在 RHS 上尝试了一个包含大量术语的概括(2582 列,足以等于我在上面的示例中尝试的术语数量)。这很好用:
dd <- as.data.frame(matrix(rnorm(4179*2582),ncol=2582)); m1 <- lm(V1 ~ ., data=dd)
会话信息(尽管这是在运行引发致命错误的代码块之前):
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] car_3.0-2 carData_3.0-2 rattle_5.2.0 rpart_4.1-13 caret_6.0-81 lattice_0.20-38 DT_0.5
[8] plotly_4.8.0 generics_0.0.2 broom_0.5.1 lubridate_1.7.4 janitor_1.1.1 ggrepel_0.8.0 rio_0.5.16
[15] data.table_1.12.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.1 readr_1.3.1 tidyr_0.8.3
[22] tibble_2.0.1 ggplot2_3.1.0 tidyverse_1.2.1 kableExtra_1.0.1
loaded via a namespace (and not attached):
[1] nlme_3.1-137 RColorBrewer_1.1-2 webshot_0.5.1 httr_1.4.0 tools_3.5.2 backports_1.1.3 R6_2.4.0
[8] lazyeval_0.2.1 colorspace_1.4-0 nnet_7.3-12 withr_2.1.2 tidyselect_0.2.5 curl_3.3 compiler_3.5.2
[15] cli_1.0.1 rvest_0.3.2 xml2_1.2.0 bookdown_0.9 scales_1.0.0 digest_0.6.18 foreign_0.8-71
[22] rmarkdown_1.11 pkgconfig_2.0.2 htmltools_0.3.6 htmlwidgets_1.3 rlang_0.3.1 readxl_1.3.0 rstudioapi_0.9.0
[29] shiny_1.2.0 jsonlite_1.6 crosstalk_1.0.0 ModelMetrics_1.2.2 zip_2.0.0 magrittr_1.5 Matrix_1.2-15
[36] Rcpp_1.0.0 munsell_0.5.0 abind_1.4-5 stringi_1.3.1 yaml_2.2.0 MASS_7.3-51.1 plyr_1.8.4
[43] recipes_0.1.4 grid_3.5.2 promises_1.0.1 crayon_1.3.4 haven_2.1.0 splines_3.5.2 hms_0.4.2
[50] knitr_1.22 pillar_1.3.1 reshape2_1.4.3 codetools_0.2-15 stats4_3.5.2 glue_1.3.0 evaluate_0.13
[57] blogdown_0.11 rpart.plot_3.0.6 modelr_0.1.4 httpuv_1.4.5.1 foreach_1.4.4 cellranger_1.1.0 gtable_0.2.0
[64] assertthat_0.2.0 xfun_0.5 gower_0.2.0 openxlsx_4.1.0 mime_0.6 prodlim_2018.04.18 xtable_1.8-3
[71] later_0.8.0 class_7.3-14 survival_2.43-3 viridisLite_0.3.0 timeDate_3043.102 iterators_1.0.10 lava_1.6.5
[78] ipred_0.9-8
解决方案
推荐阅读
- c - 缓冲区处理如何导致不同的行为?
- javascript - 在卸载 javascript 之前等待函数执行
- c - 中点积分的黎曼和不够准确
- scapy - 在 ICMP scapy 中更改 IP
- python - Python套接字在一段时间后断开连接
- laravel - Laravel 中间件未重定向到自定义登录页面
- android - cordova build --debug 和 --release 之间的区别
- python - 未找到参数“(”,)”的“service_fee”反向。尝试了 1 种模式:['fees/(?P
[0-9]+)$'] - android - Android studio - xml、item冲突问题
- string - StringContext$InvalidEscapeException: 在创建 HTML 字符串主体时,无效转义 '\:' 不是 [\b、\t、\n、\f、\r、\\、\"、\'] 之一