首页 > 解决方案 > 如何按变量分组,看看他们在给定的时间范围内是否有另一个观察,R

问题描述

我有如下内容:

person_ID     visit date       
1               2/25/2001           
1               2/30/2001           
1               4/2/2001            
2               3/18/2004           
3               9/22/2004             
3               10/27/2004          
3               5/15/2008           

我想添加另一列以查看此人在 90 天内是否有重复观察,例如:

person_ID     visit date       reoccurrence
1               2/25/2001           1
1               2/30/2001           1
1               4/2/2001            0
2               3/18/2004           0
3               9/22/2004           1   
3               10/27/2004          0
3               5/15/2008           0

任何帮助表示赞赏,谢谢!

标签: r

解决方案


如果第二个'date'不是2/30/2001,则将'visit_date'转换为Dateclass,按'person_id'分组,获取diff'day'中当前和下一个'visit_date'的erence,检查是否小于90,NA用0替换

library(dplyr)
library(lubridate)
library(tidyr)
df1 <- df1 %>% 
   mutate(visit_date = mdy(visit_date)) %>%
   group_by(person_ID) %>% 
   mutate(reoccurrence = replace_na(+(difftime(lead(visit_date), 
       visit_date, units = 'day') < 90), 0)) %>% 
   ungroup

-输出

# A tibble: 7 x 3
#  person_ID visit_date  reoccurrence
#      <int> <date>     <dbl>
#1         1 2001-02-25     1
#2         1 2001-02-28     1
#3         1 2001-04-02     0
#4         2 2004-03-18     0
#5         3 2004-09-22     1
#6         3 2004-10-27     0
#7         3 2008-05-15     0

或使用data.table

library(data.table)
setDT(df1)[, visit_date := as.IDate(visit_date, '%m/%d/%Y')
     ][, reoccurence := +(difftime(shift(visit_date, type = 'lead'), 
       visit_date, units = 'day') < 90))
        ][is.na(reoccurence), reoccurence := 0]

或与base R

df1$visit_date <- as.Date(df1$visit_date, '%m/%d/%Y')
with(df1, ave(as.integer(visit_date), person_ID, FUN = 
        function(x) c(+(diff(x) < 90), 0)))
#[1] 1 1 0 0 1 0 0

数据

df1 <- structure(list(person_ID = c(1L, 1L, 1L, 2L, 3L, 3L, 3L), visit_date = c("2/25/2001", 
"2/28/2001", "4/2/2001", "3/18/2004", "9/22/2004", "10/27/2004", 
"5/15/2008")), row.names = c(NA, -7L), class = "data.frame")

推荐阅读