首页 > 解决方案 > 如何使用 r 在以组 id 为条件的单个列中查找多个日期之间的间隔?

问题描述

所以基本上我有一个名为 df 的数据框。df 有多个列,但我想关注的是 DT(POSIXct变量)和 CITY(字符变量)。每个 CITY 在 DT 列中都有不同数量的条目。我想创建第三列,分别是每个 CITY 的每个按时间顺序排列的 DT 之间的间隔。所以每个城市都有自己的一组日期和自己的一组天间隔。

我创建了 df 的一个子集,以便更直观地了解我正在使用的内容。

DT <- as.POSIXct(c("2019-11-02 20:00:00 CET", 
"2019-11-02 19:00:00 CET", 
"2019-11-02 20:00:00 CET", 
"2019-11-03 19:30:00 CET", 
"2019-11-03 19:30:00 CET", 
"2019-11-04 19:00:00 CET", 
"2019-11-05 19:30:00 CET", 
"2019-11-05 19:00:00 CET",
"2019-11-05 20:00:00 CET", 
"2019-11-06 19:30:00 CET", 
"2019-11-06 20:30:00 CET", 
"2019-11-06 20:30:00 CET",
"2019-11-06 19:00:00 CET", 
"2019-11-06 19:30:00 CET", 
"2019-11-08 19:30:00 CET",
"2019-11-08 20:30:00 CET", 
"2019-11-08 20:00:00 CET", 
"2019-11-08 19:00:00 CET", 
"2019-11-08 19:00:00 CET", 
"2019-11-08 19:00:00 CET", 
"2019-11-09 20:00:00 CET", 
"2019-11-10 21:30:00 CET", 
"2019-11-10 19:30:00 CET", 
"2019-11-10 18:00:00 CET", 
"2019-11-10 21:00:00 CET", 
"2019-11-11 19:30:00 CET", 
"2019-11-11 22:30:00 CET", 
"2019-11-12 21:00:00 CET",
"2019-11-12 19:00:00 CET" ))
CITY <- c("TOR", "ORL", "WAS", "DAL", "CLE", "WAS", "ATL", "CLE", "ORL", "ATL", "ORL", "DAL", "WAS", "TOR", "ATL", "DAL", "TOR", "ORL", "CLE", "WAS", "DAL", "TOR", "CLE", "ORL", "ATL", "DAL", "TOR", "ATL", "CLE")
df <- data.frame(DT, CITY)
df <- df %>% arrange(CITY)
df

我在下面创建的第三列是我想要的结果,前两列是我目前拥有的。

days <- c(NA,1,2,2,2,NA,2,3,2,2,NA,3,2,1,2,NA,3,1,2,2,NA,4,2,2,1,NA,2,2,2)
df <- data.frame(df, days_since_last_entry)
df

任何帮助将不胜感激

标签: rdatedatetime

解决方案


您可以使用以下方法执行以下操作data.table

require(data.table); setDT(df)
df[, Diff := difftime(DT, shift(DT), units = 'days'), keyby = CITY]

结果

> df
                     DT CITY           Diff
 1: 2019-11-05 19:30:00  ATL        NA days
 2: 2019-11-06 19:30:00  ATL 1.0000000 days
 3: 2019-11-08 19:30:00  ATL 2.0000000 days
 4: 2019-11-10 21:00:00  ATL 2.0625000 days
 5: 2019-11-12 21:00:00  ATL 2.0000000 days
 6: 2019-11-03 19:30:00  CLE        NA days
 7: 2019-11-05 19:00:00  CLE 1.9791667 days
 8: 2019-11-08 19:00:00  CLE 3.0000000 days
 9: 2019-11-10 19:30:00  CLE 2.0208333 days
10: 2019-11-12 19:00:00  CLE 1.9791667 days
11: 2019-11-03 19:30:00  DAL        NA days
12: 2019-11-06 20:30:00  DAL 3.0416667 days
13: 2019-11-08 20:30:00  DAL 2.0000000 days
14: 2019-11-09 20:00:00  DAL 0.9791667 days
15: 2019-11-11 19:30:00  DAL 1.9791667 days
16: 2019-11-02 19:00:00  ORL        NA days
17: 2019-11-05 20:00:00  ORL 3.0416667 days
18: 2019-11-06 20:30:00  ORL 1.0208333 days
19: 2019-11-08 19:00:00  ORL 1.9375000 days
20: 2019-11-10 18:00:00  ORL 1.9583333 days
21: 2019-11-02 20:00:00  TOR        NA days
22: 2019-11-06 19:30:00  TOR 3.9791667 days
23: 2019-11-08 20:00:00  TOR 2.0208333 days
24: 2019-11-10 21:30:00  TOR 2.0625000 days
25: 2019-11-11 22:30:00  TOR 1.0416667 days
26: 2019-11-02 20:00:00  WAS        NA days
27: 2019-11-04 19:00:00  WAS 1.9583333 days
28: 2019-11-06 19:00:00  WAS 2.0000000 days
29: 2019-11-08 19:00:00  WAS 2.0000000 days
                     DT CITY           Diff

# Verifying against provided expected output
> df[, all.equal(round(Diff), days)]
[1] TRUE

如果您想在 中进行四舍五入和/或数字条目Diff,只需difftimeround(as.numeric())


推荐阅读