r - 如何使用 r 在以组 id 为条件的单个列中查找多个日期之间的间隔?
问题描述
所以基本上我有一个名为 df 的数据框。df 有多个列,但我想关注的是 DT(POSIXct
变量)和 CITY(字符变量)。每个 CITY 在 DT 列中都有不同数量的条目。我想创建第三列,分别是每个 CITY 的每个按时间顺序排列的 DT 之间的间隔。所以每个城市都有自己的一组日期和自己的一组天间隔。
我创建了 df 的一个子集,以便更直观地了解我正在使用的内容。
DT <- as.POSIXct(c("2019-11-02 20:00:00 CET",
"2019-11-02 19:00:00 CET",
"2019-11-02 20:00:00 CET",
"2019-11-03 19:30:00 CET",
"2019-11-03 19:30:00 CET",
"2019-11-04 19:00:00 CET",
"2019-11-05 19:30:00 CET",
"2019-11-05 19:00:00 CET",
"2019-11-05 20:00:00 CET",
"2019-11-06 19:30:00 CET",
"2019-11-06 20:30:00 CET",
"2019-11-06 20:30:00 CET",
"2019-11-06 19:00:00 CET",
"2019-11-06 19:30:00 CET",
"2019-11-08 19:30:00 CET",
"2019-11-08 20:30:00 CET",
"2019-11-08 20:00:00 CET",
"2019-11-08 19:00:00 CET",
"2019-11-08 19:00:00 CET",
"2019-11-08 19:00:00 CET",
"2019-11-09 20:00:00 CET",
"2019-11-10 21:30:00 CET",
"2019-11-10 19:30:00 CET",
"2019-11-10 18:00:00 CET",
"2019-11-10 21:00:00 CET",
"2019-11-11 19:30:00 CET",
"2019-11-11 22:30:00 CET",
"2019-11-12 21:00:00 CET",
"2019-11-12 19:00:00 CET" ))
CITY <- c("TOR", "ORL", "WAS", "DAL", "CLE", "WAS", "ATL", "CLE", "ORL", "ATL", "ORL", "DAL", "WAS", "TOR", "ATL", "DAL", "TOR", "ORL", "CLE", "WAS", "DAL", "TOR", "CLE", "ORL", "ATL", "DAL", "TOR", "ATL", "CLE")
df <- data.frame(DT, CITY)
df <- df %>% arrange(CITY)
df
我在下面创建的第三列是我想要的结果,前两列是我目前拥有的。
days <- c(NA,1,2,2,2,NA,2,3,2,2,NA,3,2,1,2,NA,3,1,2,2,NA,4,2,2,1,NA,2,2,2)
df <- data.frame(df, days_since_last_entry)
df
任何帮助将不胜感激
解决方案
您可以使用以下方法执行以下操作data.table
:
require(data.table); setDT(df)
df[, Diff := difftime(DT, shift(DT), units = 'days'), keyby = CITY]
结果
> df
DT CITY Diff
1: 2019-11-05 19:30:00 ATL NA days
2: 2019-11-06 19:30:00 ATL 1.0000000 days
3: 2019-11-08 19:30:00 ATL 2.0000000 days
4: 2019-11-10 21:00:00 ATL 2.0625000 days
5: 2019-11-12 21:00:00 ATL 2.0000000 days
6: 2019-11-03 19:30:00 CLE NA days
7: 2019-11-05 19:00:00 CLE 1.9791667 days
8: 2019-11-08 19:00:00 CLE 3.0000000 days
9: 2019-11-10 19:30:00 CLE 2.0208333 days
10: 2019-11-12 19:00:00 CLE 1.9791667 days
11: 2019-11-03 19:30:00 DAL NA days
12: 2019-11-06 20:30:00 DAL 3.0416667 days
13: 2019-11-08 20:30:00 DAL 2.0000000 days
14: 2019-11-09 20:00:00 DAL 0.9791667 days
15: 2019-11-11 19:30:00 DAL 1.9791667 days
16: 2019-11-02 19:00:00 ORL NA days
17: 2019-11-05 20:00:00 ORL 3.0416667 days
18: 2019-11-06 20:30:00 ORL 1.0208333 days
19: 2019-11-08 19:00:00 ORL 1.9375000 days
20: 2019-11-10 18:00:00 ORL 1.9583333 days
21: 2019-11-02 20:00:00 TOR NA days
22: 2019-11-06 19:30:00 TOR 3.9791667 days
23: 2019-11-08 20:00:00 TOR 2.0208333 days
24: 2019-11-10 21:30:00 TOR 2.0625000 days
25: 2019-11-11 22:30:00 TOR 1.0416667 days
26: 2019-11-02 20:00:00 WAS NA days
27: 2019-11-04 19:00:00 WAS 1.9583333 days
28: 2019-11-06 19:00:00 WAS 2.0000000 days
29: 2019-11-08 19:00:00 WAS 2.0000000 days
DT CITY Diff
# Verifying against provided expected output
> df[, all.equal(round(Diff), days)]
[1] TRUE
如果您想在 中进行四舍五入和/或数字条目Diff
,只需difftime
用round(as.numeric())
推荐阅读
- angular - 模糊微调器 Angular 背后的背景
- c# - 谁能解释 XSD 版本
- android - 针对 MonoDroid v8.0 的 Mvvmcross 6.0
- python - 在python中打开和读取文件
- python-3.x - 使用另一列(熊猫)的值将一列变成两列
- api - VSTS API:如何从 PAT 获取 VSTS 用户配置文件?
- python - 使用mongocxx驱动程序在c++中查询mongodb集合并将结果集返回给调用c++程序的python
- python - 在一行中打印和 for 循环
- javascript - jQuery post() 的功能与显示/隐藏
- php - 将多个值从下拉视图发送到控制器操作