首页 > 解决方案 > 如何对行子集执行线性回归并将趋势和相关值存储在 R 中的另一个数据框中

问题描述

我有这个数据框:

# A tibble: 6 x 5
  STATE STATEFP YEAR  GW_ratio SW_ratio
  <chr>   <dbl> <chr>    <dbl>    <dbl>
1 AL          1 2000    0.0425    0.958
2 AL          1 2005    0.0474    0.953
3 AL          1 2010    0.0469    0.953
4 AL          1 2015    0.0563    0.944
5 AR          5 2000    0.627     0.373
6 AR          5 2005    0.646     0.354

我想将趋势作为每个四年期间(2000、2005、2010、2015)每个州的 SW_ratio 和 GW_ratio 之间的关系,您可以使用lm. 然后我想将它存储在以下数据框中:

  STATE STATEFP trend
1    AL       1     0
2    AR       5     0
3    AZ       4     0
4    CA       6     0
5    CO       8     0
6    CT       9     0

有谁知道如何对 R 中的行子集执行回归分析?

标签: rregressionsubset

解决方案


假设您的数据是这样的:

set.seed(111)
df = data.frame(STATE  =rep(c("AL","AR"),each=4),
STATEFP = rep(c(1,5),each=4),
YEAR = rep(c(2000,2005,2010,2015),2),
GW_ratio = runif(8), SW_ratio = runif(8))

head(df)

  STATE STATEFP YEAR  GW_ratio   SW_ratio
1    AL       1 2000 0.5929813 0.43216062
2    AL       1 2005 0.7264811 0.09368152
3    AL       1 2010 0.3704220 0.55577991
4    AL       1 2015 0.5149238 0.59022849
5    AR       5 2000 0.3776632 0.06714114
6    AR       5 2005 0.4183373 0.04754785

一种方法是将其旋转很长时间,并按状态和变量分组,然后执行线性回归:

library(dplyr)
library(tidyr)
library(broom)

df %>% 
pivot_longer(-c(STATE,YEAR,STATEFP)) %>% 
group_by(STATE,STATEFP,name) %>% 
do(augment(lm(value ~ YEAR,data=.))) 

你会得到这样的东西:

# A tibble: 16 x 11
# Groups:   STATE, STATEFP, name [4]
   STATE STATEFP name      value  YEAR  .fitted  .resid  .hat  .sigma .cooksd
   <fct>   <dbl> <chr>     <dbl> <dbl>    <dbl>   <dbl> <dbl>   <dbl>   <dbl>
 1 AL          1 GW_ratio 0.593   2000  0.640   -0.0468   0.7 0.204    0.347 
 2 AL          1 GW_ratio 0.726   2005  0.581    0.146    0.3 0.137    0.265 
 3 AL          1 GW_ratio 0.370   2010  0.522   -0.151    0.3 0.128    0.286 
 4 AL          1 GW_ratio 0.515   2015  0.463    0.0523   0.7 0.200    0.433 
 5 AL          1 SW_ratio 0.432   2000  0.278    0.155    0.7 0.175    1.69  
 6 AL          1 SW_ratio 0.0937  2005  0.371   -0.277    0.3 0.0146   0.428 
 7 AL          1 SW_ratio 0.556   2010  0.465    0.0910   0.3 0.314    0.0460
 8 AL          1 SW_ratio 0.590   2015  0.558    0.0318   0.7 0.327    0.0715

剩下的就是再次将其旋转,所以这里是完整的步骤:

res = df %>% 
pivot_longer(-c(STATE,YEAR,STATEFP)) %>% 
group_by(STATE,STATEFP,name) %>% 
do(augment(lm(value ~ YEAR,data=.))) %>% 
select(c(STATE,STATEFP,name,YEAR,.fitted)) %>% 
ungroup() %>% 
pivot_wider(names_from=c(name,YEAR),values_from=.fitted)

data.frame(res)
  STATE STATEFP GW_ratio_2000 GW_ratio_2005 GW_ratio_2010 GW_ratio_2015
1    AL       1     0.6397368     0.5807136     0.5216905     0.4626673
2    AR       5     0.3263059     0.3319276     0.3375492     0.3431709
  SW_ratio_2000 SW_ratio_2005 SW_ratio_2010 SW_ratio_2015
1   0.277517333     0.3711475     0.4647777     0.5584079
2  -0.007647359     0.1170041     0.2416555     0.3663070

如果您只想在每个状态下保持 GW_ratio 与 SW_ratio 的斜率,那么它将是:

df %>% group_by(STATE,STATEFP) %>% do(tidy(lm(GW_ratio ~ SW_ratio,data=.))) %>% filter(term=="SW_ratio")
# A tibble: 2 x 7
# Groups:   STATE, STATEFP [2]
  STATE STATEFP term     estimate std.error statistic p.value
  <fct>   <dbl> <chr>       <dbl>     <dbl>     <dbl>   <dbl>
1 AL          1 SW_ratio   -0.567     0.234    -2.43    0.136
2 AR          5 SW_ratio    0.436     0.810     0.539   0.644

推荐阅读