首页 > 解决方案 > 在 R 中过滤日期

问题描述

是否有一种方法或功能可以根据观察数据的日期范围通过相同的 ID 对数据进行子集或过滤?我浏览了许多使用 dplyr 和 lubridate 的示例

Something similar maybe?
DF %>% 
 group_by(ID) %>% 
  filter_if(for i %in% Date, between("Date 1 & Date 2 is at least 6 months"))

或者

DF %>% 
 filter_if(ID = >3 & between("Date 1 & Date 2 is at least 6 months"))

具体来说,如果在任何 6 个月的日期范围内至少有 3 个,则子集观察。可以使用 Cohort_month (因为它是从 Date 列中提取的)

我的DF是:


str(DF)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    
25 obs. of  8 variables:
$ ID          : chr  "AbDu" "AbDu" "AbDu" 
"AbDu" ...
$ Reg         : num  29179 32039 35151 
38359 41509 ...
$ Date        : POSIXct, format: "2017-08- 
18" ...
$ Year        : num  2017 2017 2017 2017 
2017 ...
$ Vol1        : num  2.5 2.5 2.5 2.5 2.5 
2.5 4.9 2.5 2.5 4.9 ...
$ Vol2        : num  2.5 2.5 2.5 2.5 2.5 
2.5 4.9 2.5 2.5 4.9 ...
$ VolT        : num  10 20 20 20 20 ...
$ Cohort_month: num  8 9 10 11 12 1 1 3 4 
11 ...

DF
# A tibble: 25 x 8
ID     Reg   Date                Year  Vol1  Vol2  VolT
<chr> <dbl> <dttm>              <dbl> <dbl> <dbl> <dbl>
AbDu  29179 2017-08-18 00:00:00  2017  2.5   2.5  10  
AbDu  32039 2017-09-15 00:00:00  2017   2.5   2.5  20  
AbDu  35151 2017-10-13 00:00:00  2017   2.5   2.5  20  
AbDu  38359 2017-11-10 00:00:00  2017   2.5   2.5  20  
AbDu  41509 2017-12-08 00:00:00  2017   2.5   2.5  20  
AbDu  44732 2018-01-08 00:00:00  2018   2.5   2.5  20  
AbDu  47487 2018-01-31 00:00:00  2018   4.9   4.9  9.8
AbDu  52537 2018-03-14 00:00:00  2018   2.5   2.5  30  
AbDu  57713 2018-05-23 00:00:00  2018   2.5   2.5  30  

标签: r

解决方案


试试这个解决方案:

library(tidyverse)
library(lubridate)

df %>%
  group_by(ID) %>%
  nest() %>%
  mutate(
    data_filter = map(
      data,
      ~arrange(.x, Date) %>%
        mutate(
          Date2 = lag(Date, 2),
          MDiff = (difftime(Date, Date2) / 30) %>% as.numeric()
        ) %>%
        filter(MDiff < 6)
    ),
    n_row = map_dbl(
      data_filter,
      nrow
    )
  ) %>%
  filter(n_row > 0) %>%
  select(ID, data_filter) %>%
  unnest() %>%
  select(-MDiff) %>%
  pmap_df(
    ~filter(df, ID == ..1 & Date <= ..2 & Date >= ..3)
  )

推荐阅读