首页 > 解决方案 > 如何对R中的汇总数据进行逻辑回归?

问题描述

所以我有一些结构类似于以下的数据:

         | Works  | DoesNotWork |
         ----------------------- 
Unmarried| 130    | 235         |
Married  | 10     | 95          |

我正在尝试使用逻辑回归Work Status从 中进行预测Marriage Status,但是我认为我不了解如何在 R 中进行预测。例如,如果我的数据如下所示:

MarriageStatus  | WorkStatus| 
-----------------------------
Married         | No        |
Married         | No        |
Married         | Yes       |
Unmarried       | No        |
Unmarried       | Yes       |
Unmarried       | Yes       |

我了解我可以执行以下操作:

log_model <- glm(WorkStatus ~ MarriageStatus, data=MarriageDF, family=binomial(logit))

总结数据的时候,就是不明白怎么做。我是否需要将数据扩展为非汇总形式并编码Married/Unmarried0/1并执行相同的操作Working/Not Working并将其编码为0/1?.

仅给出第一个摘要 DF,我将如何编写逻辑回归glm函数?像这样的东西?

log_summary_model <- glm(Works ~ DoesNotWork, data=summaryDF, family=binomial(logit))

但这没有意义,因为我正在拆分响应因变量?

我不确定我是否过度复杂化了,任何帮助将不胜感激,谢谢!

标签: rlogistic-regression

解决方案


您需要将列联表扩展为数据框,然后可以使用频率计数作为权重变量来计算 logit 模型:

mod <- glm(works ~ marriage, df, family = binomial, weights = freq)
summary(mod) 

Call:
glm(formula = works ~ marriage, family = binomial, data = df, 
    weights = freq)

Deviance Residuals: 
      1        2        3        4  
 16.383    6.858  -14.386   -4.361  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.5921     0.1093  -5.416 6.08e-08 ***
marriage     -1.6592     0.3500  -4.741 2.12e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 572.51  on 3  degrees of freedom
Residual deviance: 541.40  on 2  degrees of freedom
AIC: 545.4

Number of Fisher Scoring iterations: 5

数据:

df <- read.table(text = "works marriage freq
                 1 0 130
                 1 1 10
                 0 0 235
                 0 1 95", header = TRUE)

推荐阅读