首页 > 解决方案 > dcast 中是否有允许我包含其他条件的功能?

问题描述

我正在尝试创建一个仅包含一些长格式数据的宽格式数据集。这是来自学习者通过在线学习模块的数据,在该模块中,他们有时会“卡”在屏幕上,因此会为该屏幕记录多次尝试。

lesson_long <- data.frame (id  = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
                           screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
                           question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
                           variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
                           response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))

.

id       screen     question_attempt variable response
4256279  survey1            1           age       0
4256279  survey1            1         country     5
4256279  survey1            2           age       20
4256279  survey1            2         country     5
4256279  survey2            1        education    3
4256279  survey2            1         course      2
4256308  survey1            1           age       18
4256308  survey1            1         country     5
4256308  survey2            1        education    4
4256308  survey2            1         course      1

对于我的分析,我只需要在每个屏幕上的最后一次尝试中包含他们的响应(或对他们最大 question_attempt 的响应 - 有时他们在每个屏幕上最多有 8 或 9 次尝试)。所有之前的尝试都将被取消,我不需要在最终数据集中使用屏幕名称。最终的宽格式如下所示:

id        age  country education course
4256279   20     5         3         2
4256308   18     5         4         1

我一直试图用 dcast 来做到这一点(不成功):

lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)

fun.aggregate 显然没有像我编造的那样工作......但是有解决方案吗?或者在使用 dcast 之前,我可能需要一个额外的步骤来选择数据?但是,如果这是解决方案,该怎么做呢?

很想看到你的答案。提前致谢!

标签: rreshapemeltdcast

解决方案


您可以order通过 和 选择数据idscreenquestion_attempt选择last每个 的值question_attempt

library(data.table)

setDT(lesson_long)

dcast(lesson_long[order(id, screen, question_attempt)], 
      id~variable, value.var = 'response', fun.aggregate = last, fill = NA)

#        id age country course education
#1: 4256279  20       5      2         3
#2: 4256308  18       5      1         4

同样,使用dplyrand tidyr

library(dplyr)

lesson_long %>%
  arrange(id, screen, question_attempt) %>%
  tidyr::pivot_wider(names_from = variable, values_from = response, 
                     id_cols = id, values_fn = last)

推荐阅读