首页 > 解决方案 > 按类别分组,然后找出类别之间的差异 [r]

问题描述

我正在计算 1995 年至 2015 年不同群体的平均就业率。然后计算群体之间平均就业率的差异。

这应该每年订购。

大多数时候,我尝试在 dplyr 中使用 summarise 函数,但失败了。

下面的代码是我设置的。

diff_in_diff <- Cps_total %>% 
  filter(age >= 19 & age <= 44) %>% 
  mutate(women_and_black_men = ifelse(female == 1 & marstat != 1 & nfchild == 0, "Single without children",
                                 ifelse(female == 1 & marstat != 1 & nfchild > 0, "Single with children",
                                    ifelse(female == 1 & marstat == 1 & nfchild == 0, "Married without children",
                                       ifelse(female == 1 & marstat == 1 & nfchild > 0, "Married with children",
                                          ifelse(female == 0 & wbhao == 2, "Black Men", "Otherwise Men"))))))


diff_in_diff_2 <- diff_in_diff %>% 
  filter(!is.na(empl)) %>% 
  group_by(year, women_and_black_men) %>% 
  summarize(mean_empl=mean(empl))
year |  women_and_black_men      |      mean_empl

1995 |  Black Men                |      0.8772406       
1995 |  Married with children    |      0.6810999       
1995 |  Married without children |      0.8227718       
1995 |  Otherwise Men            |      0.9048232       
1995 |  Single with children     |      0.8330486       
1995 |  Single without children  |      0.8927759       
1996 |  Black Men                |      0.8415265       
1996 |  Married with children    |      0.6800505       
1996 |  Married without children |      0.8188101       
1996 |  Otherwise Men            |      0.9035344   

这就是我发现的。

但是,我想找到Single with children minus Black men, Single with children minus Single without children,Single with children minus Married with childrenSingle with children minus Married without children之间的差异值Single with children minus Otherwise Men

因此我的期望是:

year |  Single_with_children_vs      |      diff_in_diff

1995 |  vs_Married with children     |      0.031230201
1995 |  vs Married without children  |     -0.130002012
1995 |  vs Single_without_children   |     -0.190230201
1995 |  vs Black Men                 |      0.002030210
1996 |
.
.
.

像这样的东西。

标签: rstatistics

解决方案


也许不是最优雅的解决方案,但这里有一个快速修复:

    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)

推荐阅读