首页 > 解决方案 > 数据框R中日期时间字段的滑动窗口

问题描述

我想计算一些描述性统计数据:

数据框示例:

dt <- tibble(
    order_id = 1:10,
    customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
    amount = seq(10, 100, by = 10),
    date = c("2020-10-07 12:00", # 1st customer
             "2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
             "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
             "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00") # 4th customer
  ) %>% 
  mutate(
    date = lubridate::ymd_hm(date)
  )

预期结果:

order_id customer_id amount date                amount__n__last_24h amount__mean__last_24h
<int>       <dbl>  <dbl> <dttm>                            <dbl>                  <dbl>
1           1     10 2020-10-07 12:00:00                   0                     NA
2           2     20 2020-10-07 12:00:00                   0                     NA
3           2     30 2020-10-08 11:00:00                   1                     20
4           3     40 2020-10-07 12:00:00                   0                     NA
5           3     50 2020-10-08 11:00:00                   1                     40
6           3     60 2020-10-08 20:00:00                   1                     50
7           4     70 2020-10-07 12:00:00                   0                     NA
8           4     80 2020-10-08 11:00:00                   1                     70
9           4     90 2020-10-08 20:00:00                   1                     80
10          4    100 2020-10-08 21:00:00                   2                     85

我怎样才能做到这一点?

标签: rtime-seriesxtssliding-window

解决方案


这是使用的选项data.table

dt[, dayago := date - 24 * 60 * 60]
dt[, c("n", "avg") :=
    dt[dt, on=.(customer_id, date>=dayago, date<date),
        by=.EACHI, .(n=.N, avg=mean(amount))][, (1L:3L) := NULL]
    ]

数据:

library(data.table)
dt <- data.table(
    order_id = 1:10,
    customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
    amount = seq(10, 100, by = 10),
    date = as.POSIXct(c("2020-10-07 12:00", # 1st customer
        "2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
        "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
        "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00" # 4th customer
), format=("%Y-%m-%d %H:%M")))

推荐阅读