r - dcast 中是否有允许我包含其他条件的功能？

问题描述

我正在尝试创建一个仅包含一些长格式数据的宽格式数据集。这是来自学习者通过在线学习模块的数据，在该模块中，他们有时会“卡”在屏幕上，因此会为该屏幕记录多次尝试。

lesson_long <- data.frame (id  = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
                           screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
                           question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
                           variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
                           response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))

id       screen     question_attempt variable response
4256279  survey1            1           age       0
4256279  survey1            1         country     5
4256279  survey1            2           age       20
4256279  survey1            2         country     5
4256279  survey2            1        education    3
4256279  survey2            1         course      2
4256308  survey1            1           age       18
4256308  survey1            1         country     5
4256308  survey2            1        education    4
4256308  survey2            1         course      1

对于我的分析，我只需要在每个屏幕上的最后一次尝试中包含他们的响应（或对他们最大 question_attempt 的响应 - 有时他们在每个屏幕上最多有 8 或 9 次尝试）。所有之前的尝试都将被取消，我不需要在最终数据集中使用屏幕名称。最终的宽格式如下所示：

id        age  country education course
4256279   20     5         3         2
4256308   18     5         4         1

我一直试图用 dcast 来做到这一点（不成功）：

lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)

fun.aggregate 显然没有像我编造的那样工作......但是有解决方案吗？或者在使用 dcast 之前，我可能需要一个额外的步骤来选择数据？但是，如果这是解决方案，该怎么做呢？

很想看到你的答案。提前致谢！

标签： rreshapemeltdcast

您可以order通过和选择数据id，screen并question_attempt选择last每个的值question_attempt。

library(data.table)

setDT(lesson_long)

dcast(lesson_long[order(id, screen, question_attempt)], 
      id~variable, value.var = 'response', fun.aggregate = last, fill = NA)

#        id age country course education
#1: 4256279  20       5      2         3
#2: 4256308  18       5      1         4

同样，使用dplyrand tidyr：

library(dplyr)

lesson_long %>%
  arrange(id, screen, question_attempt) %>%
  tidyr::pivot_wider(names_from = variable, values_from = response, 
                     id_cols = id, values_fn = last)

r - dcast 中是否有允许我包含其他条件的功能？

问题描述

解决方案

推荐阅读