首页 > 解决方案 > 计算不同年份的时间间隔

问题描述

我有一个包含 IN_TIME_date、OUT_TIME_date、2012-2018 期间的日期的数据集,我想计算每年的住院日,但是,有些患者的 IN_TIME_date、OUT_TIME_date 不在同一年。我该如何计算呢?非常感谢。

在此处输入图像描述

> dput(as.data.frame(Demography_newdata0129)) structure(list(CASEID = c("023252(1)", "07597558(2)", "07597558(3)", "100520(31)", "100520(32)", "100520(33)", "100520(34)", "10056(1)", "101171(4)", "101171(5)", "101455(2)", "101557(2)", "101571(3)", "101571(4)", "101571(5)", "101571(6)", "10160(5)", "101637(2)", "101893(13)", "101893(15)", "101893(16)", "102807(4)", "102807(5)", "102862(12)"), IN_TIME_date = c("2017-02-25", "2015-10-23", "2016-07-06", "2013-01-23", "2013-03-12", "2013-06-13", "2013-10-08", "2016-02-20", "2015-09-24", "2015-10-19", "2014-05-01", "2015-12-11", "2014-08-26", "2015-07-21", "2016-01-06", "2017-03-20", "2014-04-14", "2017-04-25", "2014-08-10", "2017-02-06", "2017-04-12", "2016-01-19", "2016-06-08", "2012-10-19"), OUT_TIME_date = c("2017-03-02", "2015-12-05", "2016-07-15", "2013-01-28", "2013-03-18", "2013-06-18", "2013-10-15", "2016-02-29", "2015-10-19", "2015-11-02", "2014-05-28", "2016-01-15", "2015-07-21", "2016-01-06", "43179", "2017-12-14", "2014-06-14", "2017-05-09", "2014-08-21", "2017-02-11", "2017-04-20", "2016-01-24", "2016-06-15", "2013-01-25"), LOS = c(5, 43, 9, 5, 6, 5, 7, 9, 25, 14, 27, 35, 329, 169, 804, 269, 61, 14, 11, 5, 8, 5, 7, 98 ),2012 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73),2013 = c(0, 0, 0, 5, 6, 5, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25),2014 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 127, 0, 0, 0, 61, 0, 11, 0, 0, 0, 0, 0),2015 = c(0, 43, 0, 0, 0, 0, 0, 0, 25, 14, 0, 20, 201, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),2016 = c(0, 0, 9, 0, 0, 0, 0, 9, 0, 0, 0, 15, 0, 5, 360, 0, 0, 0, 0, 0, 0, 5, 7, 0),2017 = c(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 365, 269, 0, 14, 0, 5, 8, 0, 0, 0),2018= c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -24L), class = "data.frame")

标签: r

解决方案


首先,两个评论:

  1. 你的dput抛出错误。首先,您缺少日期列名称的单个刻度;OUT_TIME_date其次, of似乎有一个错误CASEID = 101571(5)。对于以后的帖子,请仔细检查示例数据 (1) 不包含 R 语法错误,并且 (2) 正确且具有代表性。
  2. 您的预期输出中也存在错误,其中您不包括超过一年的逗留的最后一天。

这是利用lubridate函数处理Dates 的一种选择

library(tidyverse)
library(lubridate)
df %>%
    mutate_at(vars(ends_with("date")), ~as.Date(.x, format = "%Y-%m-%d")) %>%
    mutate(
        LOS = difftime(OUT_TIME_date, IN_TIME_date, units = "days"),
        grp = map2(
            IN_TIME_date,
            OUT_TIME_date,
            ~year(seq.Date(.x, .y - 1, by = "day")))) %>%
    unnest() %>%
    group_by_all() %>%
    tally() %>%
    spread(grp, n, fill = 0) %>%
    ungroup() %>%
    as.data.frame()
#        CASEID IN_TIME_date OUT_TIME_date      LOS 2012 2013 2014 2015 2016
#1    023252(1)   2017-02-25    2017-03-02   5 days    0    0    0    0    0
#2  07597558(2)   2015-10-23    2015-12-05  43 days    0    0    0   43    0
#3  07597558(3)   2016-07-06    2016-07-15   9 days    0    0    0    0    9
#4   100520(31)   2013-01-23    2013-01-28   5 days    0    5    0    0    0
#5   100520(32)   2013-03-12    2013-03-18   6 days    0    6    0    0    0
#6   100520(33)   2013-06-13    2013-06-18   5 days    0    5    0    0    0
#7   100520(34)   2013-10-08    2013-10-15   7 days    0    7    0    0    0
#8     10056(1)   2016-02-20    2016-02-29   9 days    0    0    0    0    9
#9    101171(4)   2015-09-24    2015-10-19  25 days    0    0    0   25    0
#10   101171(5)   2015-10-19    2015-11-02  14 days    0    0    0   14    0
#11   101455(2)   2014-05-01    2014-05-28  27 days    0    0   27    0    0
#12   101557(2)   2015-12-11    2016-01-15  35 days    0    0    0   21   14
#13   101571(3)   2014-08-26    2015-07-21 329 days    0    0  128  201    0
#14   101571(4)   2015-07-21    2016-01-06 169 days    0    0    0  164    5
#15   101571(5)   2016-01-06    2018-03-20 804 days    0    0    0    0  361
#16   101571(6)   2017-03-20    2017-12-14 269 days    0    0    0    0    0
#17    10160(5)   2014-04-14    2014-06-14  61 days    0    0   61    0    0
#18   101637(2)   2017-04-25    2017-05-09  14 days    0    0    0    0    0
#19  101893(13)   2014-08-10    2014-08-21  11 days    0    0   11    0    0
#20  101893(15)   2017-02-06    2017-02-11   5 days    0    0    0    0    0
#21  101893(16)   2017-04-12    2017-04-20   8 days    0    0    0    0    0
#22   102807(4)   2016-01-19    2016-01-24   5 days    0    0    0    0    5
#23   102807(5)   2016-06-08    2016-06-15   7 days    0    0    0    0    7
#24  102862(12)   2012-10-19    2013-01-25  98 days   74   24    0    0    0
#   2017 2018
#1     5    0
#2     0    0
#3     0    0
#4     0    0
#5     0    0
#6     0    0
#7     0    0
#8     0    0
#9     0    0
#10    0    0
#11    0    0
#12    0    0
#13    0    0
#14    0    0
#15  365   78
#16  269    0
#17    0    0
#18   14    0
#19    0    0
#20    5    0
#21    8    0
#22    0    0
#23    0    0
#24    0    0

IN_TIME_date这个想法是根据和生成日期的逐日序列OUT_TIME_date;然后,我们仅从这些序列中提取年份并计算每个CASEID. 剩下的就是对数据进行基本的整理/重塑。


df <- structure(list(CASEID = c("023252(1)", "07597558(2)", "07597558(3)",
"100520(31)", "100520(32)", "100520(33)", "100520(34)", "10056(1)",
"101171(4)", "101171(5)", "101455(2)", "101557(2)", "101571(3)",
"101571(4)", "101571(5)", "101571(6)", "10160(5)", "101637(2)",
"101893(13)", "101893(15)", "101893(16)", "102807(4)", "102807(5)",
"102862(12)"), IN_TIME_date = c("2017-02-25", "2015-10-23", "2016-07-06",
"2013-01-23", "2013-03-12", "2013-06-13", "2013-10-08", "2016-02-20",
"2015-09-24", "2015-10-19", "2014-05-01", "2015-12-11", "2014-08-26",
"2015-07-21", "2016-01-06", "2017-03-20", "2014-04-14", "2017-04-25",
"2014-08-10", "2017-02-06", "2017-04-12", "2016-01-19", "2016-06-08",
"2012-10-19"), OUT_TIME_date = c("2017-03-02", "2015-12-05",
"2016-07-15", "2013-01-28", "2013-03-18", "2013-06-18", "2013-10-15",
"2016-02-29", "2015-10-19", "2015-11-02", "2014-05-28", "2016-01-15",
"2015-07-21", "2016-01-06", "2018-03-20", "2017-12-14", "2014-06-14",
"2017-05-09", "2014-08-21", "2017-02-11", "2017-04-20", "2016-01-24",
"2016-06-15", "2013-01-25")), row.names = c(NA, -24L), class = "data.frame")

推荐阅读