首页 > 解决方案 > 过滤掉指定日期范围内的 ID

问题描述

我正在尝试从出现在我的数据集的前三个月内的数据框中过滤掉客户端 ID,但不要出现在前三个月结束后,让我留下出现在之前和之后的客户端 ID前三个月。我已经包含了一些代码来创建一个模拟数据集以进行说明:-

    ClientId<-c('hgjj156','jksu990','ddks989','fghs676','shjk992','hddq141','huui667','kili1772','djjp8998','hdyy1122','fghs676','shjk992','hgjj156','jksu990')

    DateStamp<-c('01-01-2015', '01-01-2015', '03-01-2015', '10-01-2015', '22-01-2015', '29-01-2015','05-02-2015','11-02-2015', '19-02-2015', '17-03-2015', '02-04-2015', '06-04-2015', '08-04-2015', '09-04-2015')

    df<-cbind(ClientId, DateStamp)
    df

这应该给你这个: -

  ClientId   DateStamp   
 "hgjj156"  "01-01-2015"
 "jksu990"  "01-01-2015"
 "ddks989"  "03-01-2015"
 "fghs676"  "10-01-2015"
 "shjk992"  "22-01-2015"
 "hddq141"  "29-01-2015"
 "huui667"  "05-02-2015"
 "kili1772" "11-02-2015"
 "djjp8998" "19-02-2015"
 "hdyy1122" "17-03-2015"
 "fghs676"  "02-04-2015"
 "shjk992"  "06-04-2015"
 "hgjj156"  "08-04-2015"
 "jksu990"  "09-04-2015"

这个想法是我会留下以下ID:-

    ClientId   DateStamp
  "hgjj156"  "01-01-2015"
  "jksu990"  "01-01-2015"
  "fghs676"  "10-01-2015"
  "shjk992"  "22-01-2015"
  "fghs676"  "02-04-2015"
  "shjk992"  "06-04-2015"
  "hgjj156"  "08-04-2015"
  "jksu990"  "09-04-2015"

关于我将如何实现这一目标有什么想法吗?我查看了 dplyr 和 data.table 解决方案,但到目前为止我还没有找到最合适的解决方案。

标签: rdatefilter

解决方案


留给我前三个月前后出现的客户 ID

library(data.table)

# formatting
DT = as.data.table(df)
DT[, DateStamp := as.IDate(DateStamp, "%d-%m-%Y")]

# set your thresholds
d_rng = range(DT$DateStamp)
d_dn = seq(d_rng[1], by="+3 months", length.out=2)[2]
d_up = d_dn

# find ids in each window
c_dn = DT[DateStamp < d_dn, unique(ClientId)]
c_up = DT[DateStamp >= d_up, unique(ClientId)]

# filter
DT[ClientId %in% intersect(c_dn, c_up)]

   ClientId  DateStamp
1:  hgjj156 2015-01-01
2:  jksu990 2015-01-01
3:  fghs676 2015-01-10
4:  shjk992 2015-01-22
5:  fghs676 2015-04-02
6:  shjk992 2015-04-06
7:  hgjj156 2015-04-08
8:  jksu990 2015-04-09

我从@GGrothendieck 的答案中借用add/remove months


推荐阅读