首页 > 解决方案 > 多条件变异

问题描述

我有一个数据框,它需要根据每个 ID 子集的某些行中列出的日期对列进行条件重新编码。我试图弄清楚如何使用 dplyr 中的 mutate 函数最好地实现这一点。欢迎提出建议和替代解决方案,但我想避免使用 for 循环。

我知道如何编写一个非常冗长且效率低下的 for 循环来解决这个问题,但我想知道如何更有效地做到这一点。

示例数据框:

df<-data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
               date = as.Date(c("2016-02-01","2016-02-01","2016-02-01","2016-03-21", "2016-03-21", "2016-03-21", "2016-10-05", "2016-10-05", "2016-10-05", "2016-10-05", "2016-03-01","2016-03-01","2016-03-01","2016-04-21", "2016-04-21", "2016-04-21", "2016-11-05", "2016-11-05", "2016-11-05", "2016-11-05")),
               trial = c(NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA)

我的伪代码 - 前两个 case_when 语句中的第二个逻辑参数是我卡住的地方。

df%>%
  group_by(ID)%>%
  mutate(results = case_when(
     is.na(trial) & date < date where trial = 1 ~ 0,
     is.na(trial) & date > date where trial = 1 ~ 2,
     trial == trial
  ))

预期的结果是:

data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
               date = as.Date(c("2016-02-01","2016-02-01","2016-02-01","2016-03-21", "2016-03-21", "2016-03-21", "2016-10-05", "2016-10-05", "2016-10-05", "2016-10-05", "2016-03-01","2016-03-01","2016-03-01","2016-04-21", "2016-04-21", "2016-04-21", "2016-11-05", "2016-11-05", "2016-11-05", "2016-11-05")),
               trial = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2)
)

标签: rdplyr

解决方案


一个选项是按“ID”分组并通过rleid在“试用”列的 ( ) 上应用运行长度 ID 来转换“试用”

library(dplyr)
library(data.table)
df %>%
   group_by(ID) %>% 
   mutate(trial = rleid(trial)-1)
# A tibble: 20 x 3
# Groups:   ID [2]
#      ID date       trial
#   <dbl> <date>     <dbl>
# 1     1 2016-02-01     0
# 2     1 2016-02-01     0
# 3     1 2016-02-01     0
# 4     1 2016-03-21     1
# 5     1 2016-03-21     1
# 6     1 2016-03-21     1
# 7     1 2016-10-05     2
# 8     1 2016-10-05     2
# 9     1 2016-10-05     2
#10     1 2016-10-05     2
#11     2 2016-03-01     0
#12     2 2016-03-01     0
#13     2 2016-03-01     0
#14     2 2016-04-21     1
#15     2 2016-04-21     1
#16     2 2016-04-21     1
#17     2 2016-11-05     2
#18     2 2016-11-05     2
#19     2 2016-11-05     2
#20     2 2016-11-05     2

或使用rle

df %>% 
  group_by(ID) %>%
  mutate(trial = with(rle(is.na(trial)), 
             rep(seq_along(values), lengths))-1)

推荐阅读