首页 > 解决方案 > 值为文本的 dcast

问题描述

我正在寻找传播或 dcast 值是文本字符串的 data.frame 。

df = data.frame(employeeid = c(1,1,2,2),
                question=c('do you like milk?', 'do you like apples?', 'do you like milk?', 'do you like apples?'),
                Answer=c('Yes','No','No','No'))

我希望将其转换为一种宽格式,其中列标题是员工 ID 和问题。我已经尝试过df = spread(df,question,Answer),但这似乎并没有做到

标签: rtidyr

解决方案


既然你有dcast你的标题,我会假设data.table

data.table::dcast(question ~ employeeid, data = df, value.var = "Answer")
#              question   1  2
# 1 do you like apples?  No No
# 2   do you like milk? Yes No

但另一种选择:

tidyr::spread(df, employeeid, Answer)
#              question   1  2
# 1 do you like apples?  No No
# 2   do you like milk? Yes No

编辑:因为看起来你在数据中有欺骗,你可以找到“最常见”的答案:

most <- function(x) names(sort(table(x)))[1]
data.table::dcast(question~employeeid, data=df, value.var="Answer", fun.aggregate = most)
#              question   1   2
# 1 do you like apples? Yes Yes
# 2   do you like milk?  No Yes

推荐阅读