首页 > 解决方案 > 分组并计算(第一和第二)和(第一和第三)之间的天数发生在R中的id

问题描述

我如何分组和计算(第一和第二)和(第一和第三)之间的天数在 R 中发生了一个 id,例如我有以下数据框:

CRASH_DATE  geoid           CRASH_TIME  type
2015-12-10  123             1650        Fatal_i
2015-12-06  156             1722        Fatal_i
2015-12-10  123             1956        Fatal_i
2015-11-29  156             705         Fatal_i
2015-11-21  156             1756        Fatal_i
2015-12-10  123             1936        Fatal_i
2015-11-19  156             712         Fatal_i
2015-11-21  112             1706        Fatal_i
...

我想要一个输出,例如:

geoid   days_between(1,2)    days_between(1,3)
123     0                    0                 
156     2                    10                
112     Nan                  Nan                       
...

这是我的代码:

 dt2  <- data.table(table)
 dt22 <- dt2[,list(diff1 = CRASH_DATE - shift(CRASH_TIME, fill = 
 first(CRASH_TIME)),diff2 = CRASH_DATE - shift(CRASH_TIME, fill = 
 first(CRASH_TIME))),by = c("geoid")]

但这是错误的。

标签: rdataframe

解决方案


df = read.table(text = "
CRASH_DATE  geoid           CRASH_TIME  type
2015-12-10  123             1650        Fatal_i
2015-12-06  156             1722        Fatal_i
2015-12-10  123             1956        Fatal_i
2015-11-29  156             705         Fatal_i
2015-11-21  156             1756        Fatal_i
2015-12-10  123             1936        Fatal_i
2015-11-19  156             712         Fatal_i
2015-11-21  112             1706        Fatal_i
", header=T)

library(dplyr)
library(lubridate)

df %>%
  mutate(CRASH_DATE = ymd(CRASH_DATE)) %>%  # update to date variable (if needed)
  arrange(CRASH_DATE) %>%
  group_by(geoid) %>%
  summarise(days_between_1_2 = as.numeric(CRASH_DATE[2] - CRASH_DATE[1]),
            days_between_1_3 = as.numeric(CRASH_DATE[3] - CRASH_DATE[1]))

# # A tibble: 3 x 3
#   geoid days_between_1_2 days_between_1_3
#   <int>            <dbl>            <dbl>
# 1   112               NA               NA
# 2   123                0                0
# 3   156                2               10

推荐阅读