首页 > 解决方案 > 取表并按所选变量分层

问题描述

目前正在努力重新调整该表的方向,因为我对 R 和整理数据仍然不熟悉。我有一张关于大学录取的小数据表,我想按部门进行分层,将不同的百分比作为新列(下面的预期结果)。

这是我到目前为止所尝试的,为我提供了以下输出:

library(tidyverse)
ucb %>%
    group_by(Admit, Gender, n, Dept) %>%
    spread(key = Admit, value = n) %>%
    mutate(Prop_Admitted = Admitted / (Admitted + Rejected), 
           Prop_Rejected = Rejected / (Admitted + Rejected)) %>%
    head(n = 20)

# Output
Gender  Dept    Admitted    Rejected    Prop_Admitted   Prop_Rejected
<chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>
Female  A   89  19  0.82407407  0.1759259
Female  B   17  8   0.68000000  0.3200000
Female  C   202 391 0.34064081  0.6593592
Female  D   131 244 0.34933333  0.6506667
Female  E   94  299 0.23918575  0.7608142
Female  F   24  317 0.07038123  0.9296188
Male    A   512 313 0.62060606  0.3793939
Male    B   353 207 0.63035714  0.3696429
Male    C   120 205 0.36923077  0.6307692
Male    D   138 279 0.33093525  0.6690647
Male    E   53  138 0.27748691  0.7225131
Male    F   22  351 0.05898123  0.9410188

这是我希望最终得到的表格,当然,填写了所有相应的值。

Dept    Female_Admitted Female_Rejected Male_Admitted   Male_Rejected   Male_Proportion_Admitted    Female_Proportion_Admitted
A   
B   
C   
D   
E   
F   

我只是不确定如何从我的代码转到将性别选项作为列,从而消除重复的部门字母。任何帮助表示赞赏!如果有帮助,我也试图将其保留在 tidyverse 中。

标签: rsummarize

解决方案


你可以使用pivot_wider新的tidyr,它可以接受多个values_from参数

tidyr::pivot_wider(df, names_from = Gender, 
                       values_from = matches("Admitted$|Rejected"))

# A tibble: 6 x 9
#  Dept  Admitted_Female Admitted_Male Rejected_Female Rejected_Male
#  <fct>           <int>         <int>           <int>         <int>
#1 A                  89           512              19           313
#2 B                  17           353               8           207
#3 C                 202           120             391           205
#4 D                 131           138             244           279
#5 E                  94            53             299           138
#6 F                  24            22             317           351
# … with 4 more variables: Prop_Admitted_Female <dbl>,Prop_Admitted_Male <dbl>,
#   Prop_Rejected_Female <dbl>,Prop_Rejected_Male <dbl>

对于旧版本的tidyr,我们可以做

df %>%
   gather(key, value, -Gender, -Dept) %>%
   unite(key, Gender, key) %>%
   spread(key, value)

df在哪里

df <- ucb %>%
        group_by(Admit, Gender, n, Dept) %>%
        spread(key = Admit, value = n) %>%
        mutate(Prop_Admitted = Admitted / (Admitted + Rejected), 
               Prop_Rejected = Rejected / (Admitted + Rejected))

推荐阅读