首页 > 解决方案 > R:根据条件将前导行值设置为 NA

问题描述

这是我的示例数据框:

df <- tibble( id = c(rep('a', 5), rep('b', 6), rep('c', 6)),
  event = c("Visit 1", "Visit 2", 
                "Visit 3", "Visit 4", "Visit 5",  
                "Visit 1", "Visit 2", "Visit 3",
                NA, "Visit 4", "Visit 5",
                "Visit 1", NA, "visit 2", 
                "Visit 3","Visit 4", "Visit 5"),
             expected_output = c("Visit 1", "Visit 2", 
                                 "Visit 3", "Visit 4", "Visit 5",  
                                 "Visit 1", "Visit 2", "Visit 3",
                                 NA,  NA,  NA, "Visit 1", NA, NA, NA, NA, NA))

我想创建一个新列,按 分组id,每当列中NA出现a 时,event所有后续行都应设置为NA。如expected_output列中所示。

这是我的尝试,expected_output_b使用该lead函数创建一个新列,但是它不起作用。任何其他人都可以帮助解决这个问题,因为不幸的是我想不出另一种方法来解决它。

df <- df %>% group_by(id) %>% mutate( expected_output_b = if_else(is.na(event), all(lead(event)) == NA, event))

谢谢!

标签: rdataframedplyr

解决方案


一种data.table方法

library(data.table)
setDT(df)[, new := ifelse(df[, cumsum(is.na(event)) > 0, by = .(id)]$V1, NA, event)]
#    id   event expected_output     new
# 1:  a Visit 1         Visit 1 Visit 1
# 2:  a Visit 2         Visit 2 Visit 2
# 3:  a Visit 3         Visit 3 Visit 3
# 4:  a Visit 4         Visit 4 Visit 4
# 5:  a Visit 5         Visit 5 Visit 5
# 6:  b Visit 1         Visit 1 Visit 1
# 7:  b Visit 2         Visit 2 Visit 2
# 8:  b Visit 3         Visit 3 Visit 3
# 9:  b    <NA>            <NA>    <NA>
#10:  b Visit 4            <NA>    <NA>
#11:  b Visit 5            <NA>    <NA>
#12:  c Visit 1         Visit 1 Visit 1
#13:  c    <NA>            <NA>    <NA>
#14:  c visit 2            <NA>    <NA>
#15:  c Visit 3            <NA>    <NA>
#16:  c Visit 4            <NA>    <NA>
#17:  c Visit 5            <NA>    <NA>

推荐阅读