首页 > 解决方案 > How to remove rows based multiple conditions

问题描述

I want to delete the row based on conditions specified in two different columns, for each group. in my case, I want to remove "Death" which occurs in the first admission, but keep "Death" when it occurs in the readmission, for each patient's id

here is the initial data.frame :

ConditionI <- c("2017-01-01", "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-03-01", "2018-04-01","2018-04-10")

ConditionII <- c("Death", "Alive", "Alive", "Death", "Alive", "Alive", "Death", "Alive", "Death")

id <- c("A","B","B","B","C","C","D","E","E")

df <- data.frame(id,ConditionI,ConditionII

my goal is :

ConditionII <- c( "Alive", "Alive", "Death", "Alive", "Alive", "Alive", "Death")
ConditionI <- c( "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-04-01","2018-04-10")
id <- c("B","B","B","C","C","E","E")

df <- data.frame(id,ConditionI,ConditionII)

I thought this was a very basic question, but I tried several times and didn't get the answer. your help is very much appreciated. thanks in advance!

标签: r

解决方案


我们可以直接使用subsetwith duplicatedfrombase R

subset(df,  !id %in% id[!duplicated(id) & ConditionII == 'Death'])
#   id ConditionI ConditionII
#2  B 2018-01-01       Alive
#3  B 2018-01-15       Alive
#4  B 2018-01-20       Death
#5  C 2018-02-01       Alive
#6  C  2018-02-1       Alive
#8  E 2018-04-01       Alive
#9  E 2018-04-10       Death

或与dplyr

library(dplyr)
df %>%
    filter( !id %in% id[!duplicated(id) & ConditionII == 'Death'])

推荐阅读