r - Eliminate duplicates based on conditions from several columns in R
问题描述
This is my dataset:
df <- data.frame(PatientID = c("3454","3454","3454","345","345","345"), date = c("05/01/2001", "02/06/1997", "29/03/2004", "05/2/2021", "01/06/1960", "29/03/2003"),
infarct1 = c(TRUE, NA, TRUE, NA, NA, TRUE),infarct2 = c(TRUE, TRUE, TRUE, TRUE, NA, TRUE, stringsAsFactors = F)
Basically I need to keep just 1 patient ID (aka, eliminate duplicated PatientID
), based on the most recent infarct (last infarct==TRUE
[but any kind of infarct] based on date
).
So the outcome I want would look like:
df <- data.frame(PatientID = c("3454","345"), date = c("29/03/2004", "05/2/2021"),
infarct = c(TRUE,TRUE), stringsAsFactors = F)
Hope this makes sense.
Thanks
解决方案
Try this:
library(dplyr)
df <- df %>%
mutate(infarct = infarct1 | infarct2) %>%
filter(infarct == TRUE) %>%
group_by(PatientID, infarct) %>%
summarise(date=max(date))
- Create
infarct
variable. - Filter TRUE infarct.
- Group.
- Look for last time.
推荐阅读
- sql - 如何使用 REGEX 或 SQL 查询检测汉字?
- ruby - 如何使用 Cucumber、Capybara 和 Ruby 登录基本授权模式浏览器
- html - 如何设置嵌套柔性显示的边距/填充?
- android - 更改在 RecyclerView.Adapter 上使用 getItem() 创建的对象的值会导致原始对象的值发生更改
- react-native - 水平滚动视图,一列中有 2 个项目,一次可见 3 个项目
- spring-security - spring-security 示例 saml2login 错误 Metadata not found
- reactjs - 在redux中为多个组件使用`requestAnimationFrame`
- javascript - 在 HTML5 视频中防止键盘上的向后和向前键
- javascript - 使用角度从服务器端 php 文件中获取 json 数据?
- node.js - 无法重新初始化并从我的 catch 语句返回错误响应