首页 > 解决方案 > 以 Group By 方式标记行

问题描述

我想区分 3 种情况:

1 - Events A and B happened at the same session ("ID") - "flag 1".
2 - Events B happened without A - "flag 2".
3 - Else - "flag 0".

例如:

ID   EVENT
1      A
1      B
2      D
2      E
2      C
3      B
4      A

我想得到:

ID   FLAG 
1      1
2      0
3      2
4      0

标签: r

解决方案


可用于dplyr::case_when汇总 ID 的值。在这种情况下,使用anyandall将有助于确定汇总数据是否同时包含AandB或仅包含B。解决方案如下:

library(dplyr) 
# In addition, "plyr" shouldn't be brought to the session, otherwise
# it will return one line   

df %>% group_by(ID) %>%
  summarise(FLAG = case_when(
    any(EVENT == "A") & any(EVENT == "B") ~ 1,
    all(EVENT == "B")                     ~ 2,
    TRUE                                  ~ 0
  )) %>% as.data.frame()

#   ID FLAG
# 1  1    1
# 2  2    0
# 3  3    2
# 4  4    0

数据:

df <- read.table(text=
"ID   EVENT
1      A
1      B
2      D
2      c
3      B
4      A",
header = TRUE, stringsAsFactors = FALSE)

推荐阅读