r - 取表并按所选变量分层
问题描述
目前正在努力重新调整该表的方向,因为我对 R 和整理数据仍然不熟悉。我有一张关于大学录取的小数据表,我想按部门进行分层,将不同的百分比作为新列(下面的预期结果)。
这是我到目前为止所尝试的,为我提供了以下输出:
library(tidyverse)
ucb %>%
group_by(Admit, Gender, n, Dept) %>%
spread(key = Admit, value = n) %>%
mutate(Prop_Admitted = Admitted / (Admitted + Rejected),
Prop_Rejected = Rejected / (Admitted + Rejected)) %>%
head(n = 20)
# Output
Gender Dept Admitted Rejected Prop_Admitted Prop_Rejected
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
Female A 89 19 0.82407407 0.1759259
Female B 17 8 0.68000000 0.3200000
Female C 202 391 0.34064081 0.6593592
Female D 131 244 0.34933333 0.6506667
Female E 94 299 0.23918575 0.7608142
Female F 24 317 0.07038123 0.9296188
Male A 512 313 0.62060606 0.3793939
Male B 353 207 0.63035714 0.3696429
Male C 120 205 0.36923077 0.6307692
Male D 138 279 0.33093525 0.6690647
Male E 53 138 0.27748691 0.7225131
Male F 22 351 0.05898123 0.9410188
这是我希望最终得到的表格,当然,填写了所有相应的值。
Dept Female_Admitted Female_Rejected Male_Admitted Male_Rejected Male_Proportion_Admitted Female_Proportion_Admitted
A
B
C
D
E
F
我只是不确定如何从我的代码转到将性别选项作为列,从而消除重复的部门字母。任何帮助表示赞赏!如果有帮助,我也试图将其保留在 tidyverse 中。
解决方案
你可以使用pivot_wider
新的tidyr
,它可以接受多个values_from
参数
tidyr::pivot_wider(df, names_from = Gender,
values_from = matches("Admitted$|Rejected"))
# A tibble: 6 x 9
# Dept Admitted_Female Admitted_Male Rejected_Female Rejected_Male
# <fct> <int> <int> <int> <int>
#1 A 89 512 19 313
#2 B 17 353 8 207
#3 C 202 120 391 205
#4 D 131 138 244 279
#5 E 94 53 299 138
#6 F 24 22 317 351
# … with 4 more variables: Prop_Admitted_Female <dbl>,Prop_Admitted_Male <dbl>,
# Prop_Rejected_Female <dbl>,Prop_Rejected_Male <dbl>
对于旧版本的tidyr
,我们可以做
df %>%
gather(key, value, -Gender, -Dept) %>%
unite(key, Gender, key) %>%
spread(key, value)
df
在哪里
df <- ucb %>%
group_by(Admit, Gender, n, Dept) %>%
spread(key = Admit, value = n) %>%
mutate(Prop_Admitted = Admitted / (Admitted + Rejected),
Prop_Rejected = Rejected / (Admitted + Rejected))