首页 > 解决方案 > 给定类型 min_date 和 max_date 的聚合 ID 随时间变化

问题描述

我试图通过时间按类型聚合日期,考虑到每个时间间隔的本地min_datemax_date在整个时间段内发生的变化。

id  type    date
1   A       2019-01-01
1   A       2019-01-02
1   B       2019-01-03
1   B       2019-01-04
2   A       2019-01-01
2   A       2019-01-02
2   B       2019-01-03
2   A       2019-01-04

代码:

df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"))

我试图得到的结果:

id  type    min_date    max_date
1   A       2019-01-01  2019-01-02
1   B       2019-01-03  2019-01-04
2   A       2019-01-01  2019-01-02
2   B       2019-01-03  2019-01-03
2   A       2019-01-04  2019-01-04

我厌倦了下面的代码,但它只显示了全局min_datemax_date - 不考虑随时间的变化:

library(dplyr)
df_changes <- df %>% group_by(id, type) %>% summarise(listings = n(), min_date = min(date), max_date = max(date))

标签: rdplyr

解决方案


你可以做:

df %>%
 group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths))) %>%
 summarise(min_date = min(date),
           max_data = max(date),
           id = first(id),
           type = first(type)) %>%
 ungroup() %>%
 select(-rleid)

  min_date   max_data      id type 
  <date>     <date>     <dbl> <chr>
1 2019-01-01 2019-01-02     1 A    
2 2019-01-03 2019-01-04     1 B    
3 2019-01-01 2019-01-02     2 A    
4 2019-01-03 2019-01-03     2 B    
5 2019-01-04 2019-01-04     2 C 

或者:

df %>%
 group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths)), id, type) %>%
 summarise(min_date = min(date),
           max_data = max(date)) %>%
 ungroup() %>%
 select(-rleid)

样本数据:

df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"),
                  stringsAsFactors = FALSE)

推荐阅读