首页 > 解决方案 > 合并数据帧,以便将一个数据帧中的值插入到另一个数据帧的匹配行号中

问题描述

我想以某种方式更改数据集的格式。假设我有一个数据列表,表明参与者参加辅导课程的时间和次数。他们可以在十二周内的任何时间参加最多三个会议。说他们的数据是这样记录的

set.seed(01234)
df1 <- data.frame(id = rep(LETTERS[1:4], each = 3),
                  session = rep(paste0("session", 1:3), length.out = 12),
                  week1 = c(sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)))) 
df1$week1[c(3,8,9,12)] <- NA # insert some NAs representing sessions that weren't attended

数据集看起来像这样

#    id  session week1
# 1   A session1     2
# 2   A session2     7
# 3   A session3    NA
# 4   B session1     7
# 5   B session2     8
# 6   B session3    10
# 7   C session1     1
# 8   C session2    NA
# 9   C session3    NA
# 10  D session1     6
# 11  D session2     7
# 12  D session3    NA

但我想要一个长数据集,每个人在他们本可以参加的十二周中的每一周都有一行,就像这样

df2 <- data.frame(id = rep(LETTERS[1:4], each = 12),
                  week2 = rep(1:12, times = 4))

所以参与者 A 的数据看起来像这样

df2[1:12,]

#    id week2
# 1   A     1
# 2   A     2
# 3   A     3
# 4   A     4
# 5   A     5
# 6   A     6
# 7   A     7
# 8   A     8
# 9   A     9
# 10  A    10
# 11  A    11
# 12  A    12

我想以某种方式合并这两者,以便week1df1 列中的数字与 df2 中的相应行匹配,理想情况下是这样的(示例仅限参与者 A)

data.frame(id = rep("A", 12),
           week = 1:12,
           attended = c(0,1,0,0,0,0,1,0,0,0,0,0))

#    id week attended
# 1   A    1        0
# 2   A    2        1
# 3   A    3        0
# 4   A    4        0
# 5   A    5        0
# 6   A    6        0
# 7   A    7        1
# 8   A    8        0
# 9   A    9        0
# 10  A   10        0
# 11  A   11        0
# 12  A   12        0

标签: r

解决方案


您可以使用扩展原始data.frame,tidyr::complete这样您就不需要合并,只需定义week1为具有正确级别数的因子:

library(dplyr)
library(tidyr)

df1 %>% 
  group_by(id) %>%
  mutate(week1 = factor(week1, levels = 1:12), 
         session = !is.na(session)) %>%
  complete(week1, fill = list(session = 0)) 

# A tibble: 52 x 3
# Groups:   id [4]
   id    week1 session
   <fct> <fct>   <dbl>
 1 A     1           0
 2 A     2           1
 3 A     3           0
 4 A     4           0
 5 A     5           0
 6 A     6           0
 7 A     7           1
 8 A     8           0
 9 A     9           0
10 A     10          0
# ... with 42 more rows

推荐阅读