首页 > 解决方案 > 用于称重的变量集改变了结果估计

问题描述

在比较传递给svyby函数的一组变量对结果估计值和标准误差的影响时,我发现权衡单个变量和两个变量会产生相同的估计值,但权衡多个变量会产生明显低于另一个的估计值两种方法。

这是什么原因,我怎样才能避免这种情况发生?

数据集链接:https ://drive.google.com/open?id=1xqFxUBLZifaz57yvoNFOcvhBDGuHuSMq

这是我的代码:

library(tidyverse)
library(survey)

load("des2004small.RData")

weighUp <- function(variables) {
  svyby(formula = make.formula(variables), by = ~statefip, 
        design = des2004small,  
        FUN = svytotal, na.rm = TRUE)
}

# Weigh up a single variable:
dfstate2004_singleVariable = weighUp(c("race_acs"))
# Weigh up two variables:
dfstate2004_twoVariables = weighUp(c("race_acs", "cvap_acs"))
# Weigh up multiple variables:
dfstate2004_multipleVariables = weighUp(c("race_acs", "cit_acs", 
                                          "educ_acs", "unemployed_acs", "labforce_acs", "poverty_acs", "cvap_acs"))

# Compare the three diffent methods:
comparison2004 = dfstate2004_singleVariable %>% 
  inner_join(dfstate2004_twoVariables, by = "statefip", suffix = c(".single", ".two")) %>%
  inner_join(dfstate2004_multipleVariables, by = "statefip", suffix = c("", ".multiple"))

race_acswhite2004 = comparison2004 %>% 
  select(statefip, 
         single = race_acswhite.single, 
         two = race_acswhite.two, 
         multiple = race_acswhite)
race_acswhite2004

以下是由此产生的不同估计:

+-------------------------------------+
|   statefip  single     two multiple |
+-------------------------------------+
| 1        1 3084123 3084123  2128346 |
| 2        2  427008  427008   277075 |
+-------------------------------------+

标签: rsurvey

解决方案


'multiple' 表中的变量有缺失值,并丢弃它正在分析的任何svytotal变量上的任何缺失值的观察值。好吧,默认情况下它会给出结果,但如果你要求它丢弃缺失值,它会丢弃它们和整个观察结果。NAna.rm=TRUE


推荐阅读