首页 > 解决方案 > 在 R 中统计几列分类变量

问题描述

我有调查数据,其中受访者以李克特量表对几个项目进行评分,如下所示:

id  item1                   item2                   item3                   item4

42  Moderately adequate     Completely adequate     Very adequate           Very adequate
48  Moderately adequate     Moderately adequate     Moderately adequate     Moderately adequate
49  Moderately adequate     Moderately adequate     Moderately adequate     Moderately adequate
50  Slightly adequate       Slightly adequate       Slightly adequate       Not at all adequate

我想将其转换为一个数据结构,对于每个项目,它都有一个收到的评分计数,如下所示:

rating              item1       item2       item3       item4

Not at all adequate     0           0           0           1
Slightly adequate       1           1           1
Moderately adequate     3           2           2           2
Very adequate           0           0           1           1
Completely adequate     0           1           0           0

重塑这些数据的最有效方法是什么?我试过dcast(data = melt(data, id.vars = "id"), value ~.)了,但是这会汇总所有四个项目的评分,而不是将每个项目保留在自己的列中;count和的同样问题tally。我可以逐项执行此操作,然后将列重新合并在一起,但似乎必须有一种更简单的方法,特别是因为我需要在几个不同的项目列表中复制它。

标签: rdplyrtidyversedata-cleaningsurvey

解决方案


以长格式获取数据,count并以宽格式取回:

library(dplyr)
library(tidyr)

data %>%
  pivot_longer(cols = -id) %>%
  count(name, value) %>%
  pivot_wider(names_from = name, values_from = n, values_fill = list(n = 0))

# A tibble: 5 x 5
#  value               item1 item2 item3 item4
#  <chr>               <int> <int> <int> <int>
#1 Moderately_adequate     3     2     2     2
#2 Slightly_adequate       1     1     1     0
#3 Completely_adequate     0     1     0     0
#4 Very_adequate           0     0     1     1
#5 Not_at_all_adequate     0     0     0     1

数据

我在列中的值中添加了下划线,item因为很难复制带有空格的数据。

data <- structure(list(id = c(42L, 48L, 49L, 50L),item1 = c("Moderately_adequate",
"Moderately_adequate", "Moderately_adequate", "Slightly_adequate"
), item2 = c("Completely_adequate", "Moderately_adequate", "Moderately_adequate",
"Slightly_adequate"), item3 = c("Very_adequate", "Moderately_adequate", 
"Moderately_adequate", "Slightly_adequate"), item4 = c("Very_adequate", 
"Moderately_adequate", "Moderately_adequate", "Not_at_all_adequate"
)), class = "data.frame", row.names = c(NA, -4L))

推荐阅读