首页 > 解决方案 > 如何将观察结果与多个类别的成员进行交叉表分析?

问题描述

我有一个数据集,其中包含相互排斥和非相互排斥类别的观察结果。例如,假设不存在混合种族,而存在多个公民身份,因此数据集如下所示:

 id white hispanic asian usa canada uk
 1     0        1     0   1      0  1
 2     1        0     0   0      1  0
 3     0        0     1   1      0  1
 4     1        0     0   1      1  0
 5     0        1     0   0      0  1
 6     0        0     1   0      0  1

如您所见,任何一个人/观察都只有一个种族,但可以有多个公民身份。我希望按公民身份分解种族并产生如下内容:

         usa       canada    uk        total
white     1 (33%)   2 (66%)   0         3  
hispanic  1 (33%)   0         2 (66%)   3  
asian     1 (33%)   0         2 (66%)   3  
total     3         2         3      

我如何编写一个可以总结各个类别的循环,以便我可以在种族和公民身份之间进行交叉表(重复计算是可以的)?

任何有关此类数据可视化的建议/建议将不胜感激。非常感谢您的帮助!

标签: rcategorical-datacrosstab

解决方案


根据我的理解,您可以将数据更改为整洁的格式,然后用于janitor获取交叉表:

数据:

df <- data.frame(id = seq(1,6),
                 white = c(0,1,0,1,0,0),
                 hispanic = c(1,0,0,0,1,0),
                 asian = c(0,0,1,0,0,1),
                 usa = c(1,0,1,1,0,0),
                 canada = c(0,1,0,1,0,0),
                 uk = c(1,0,1,0,1,1)) 

代码:

library(tidyverse)
library(janitor)

df %>% 
  pivot_longer(cols = 2:4,names_to = "Origin") %>% 
  filter(value == 1) %>% 
  select(-value) %>% 
  pivot_longer(cols = 2:4, names_to = "ethnicity") %>% 
  filter(value == 1) %>% 
  select(-value) %>% 
  tabyl(Origin, ethnicity) %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "col") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

输出:

   Origin   canada       uk      usa    Total
    asian 0   (0%) 2  (50%) 1  (33%) 3  (33%)
 hispanic 0   (0%) 2  (50%) 1  (33%) 3  (33%)
    white 2 (100%) 0   (0%) 1  (33%) 3  (33%)
    Total 2 (100%) 4 (100%) 3 (100%) 9 (100%)

推荐阅读