首页 > 解决方案 > 计算滞后一年的天数差异

问题描述

我有一张带有 id 和停止和开始日期的表格。我需要更改一个名为“running_time_in_days”的新列,该列查找每个 ID 从一年的开始日期到下一年的停止状态的差异(该过程延续到 1 月 1 日,因此持续时间超过了新的一年)。

library(readr)
library(dplyr)
library(lubridate)

# read in some test data
test <- read_csv("id, measure, measure_date
        1, start, 1998-10-3 
        1, stop, 1999-5-1
        1, stop, 2000-4-15
        1, start, 1999-11-9
        2, stop, 2000-6-1
        2, stop, 1999-5-25
        2, start, 1999-12-1 
        2, start, 1998-11-12")
# format "measure_date" as a date
test$measure_date <- test$measure_date %>% as_date('%Y-%m-%d')

输出可能看起来像这样:

    id      start       stop        duration
    1   1998-10-03  1999-05-01  stop - start (days)
    1   1999-05-01  2000-04-15  stop - start (days)
    2   1998-11-12  1999-05-25  stop - start (days)
    2   1999-12-01  2000-06-01  stop - start (days)

标签: rtidyrlubridate

解决方案


看看这是否适合你:

test %>% 
  # create unique row identifier for wide form, since there are
  # multiple start / stop dates associated with each id
  group_by(id) %>% 
  arrange(measure_date) %>% 
  mutate(id2 = rep(seq(1, n()/2), each = 2)) %>% 
  ungroup() %>% 

  # convert to wide form & calculate difference between dates
  tidyr::spread(measure, measure_date) %>% 
  mutate(duration = difftime(stop, start, "days")) %>% 
  select(-id2)

# A tibble: 4 x 4
     id start      stop       duration
  <dbl> <date>     <date>     <drtn>  
1     1 1998-10-03 1999-05-01 210 days
2     1 1999-11-09 2000-04-15 158 days
3     2 1998-11-12 1999-05-25 194 days
4     2 1999-12-01 2000-06-01 183 days

推荐阅读