r - 分组,然后在日期时间超过某个时间时创建一个“中断”,在原始分组列(R,dplyr)中创建一个新值
问题描述
我有数据集,df,
Subject Folder Message Date
A Out 9/9/2019 5:46:38 PM
A Out 9/9/2019 5:46:40 PM
A Out 9/9/2019 5:46:42 PM
A Out 9/9/2019 5:46:43 PM
A Out 9/9/2019 9:30:00 PM
A Out 9/9/2019 9:30:01 PM
B Out 9/9/2019 9:35:00 PM
B Out 9/9/2019 9:35:01 PM
我正在尝试按主题对其进行分组,找到持续时间,然后创建一个新的持续时间列。如果日期时间超过一定时间,我还希望创建一个阈值。我的困境是,在A组内,时间从第4排的5点46分到第5排的9点30分。这在 A 组中给出了不准确的持续时间。当时间超过 10 分钟时,我希望“打破”那个时间并找到新的持续时间,同时在主题中创建一个新值 (A1)。我不确定是否应该为此使用循环?
Subject Duration Group
A 5 sec outdata1
A1 1 sec outdata2
B 1 sec outdata3
这是我的输入:
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L), .Label = c("A", "B"), class = "factor"), Folder = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Out", class = "factor"),
Message = c("", "", "", "", "", "", "", ""), Date = structure(1:8, .Label = c("9/9/2019 5:46:38 PM",
"9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:46:43 PM",
"9/9/2019 9:30:00 PM", "9/9/2019 9:30:01 PM", "9/9/2019 9:35:00 PM",
"9/9/2019 9:35:01 PM"), class = "factor")), row.names = c(NA,
-8L), class = "data.frame")
这是我尝试过的:
thresh <- duration(10, units = "minutes")
df %>%
mutate(Date = mdy_hms(Date)) %>%
transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format =
"%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date,
format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>%
ungroup %>%
distinct %>%
mutate(grp = str_c("Outdata", row_number()))
mutate(delta = if_else(grp < thresh1, grp, NA_real_))
解决方案
我们可以计算连续Date
值之间的持续时间以创建新组,然后计算每个组之间min
的时间差。max
library(dplyr)
thresh <- 10
df %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")) %>%
group_by(Subject, Group = cumsum(difftime(Date,
lag(Date, default = first(Date)), units = "mins") > thresh)) %>%
summarise(Duration = difftime(max(Date), min(Date), units = "secs")) %>%
ungroup %>%
mutate(Group = paste0('outdata', row_number()))
# A tibble: 3 x 3
# Subject Group Duration
# <fct> <chr> <drtn>
#1 A outdata1 5 secs
#2 A outdata2 1 secs
#3 B outdata3 1 secs
推荐阅读
- reactjs - 如何使用接口传递数组对象或将它们存储在reducer中,过滤器如何处理它?
- r - 当 R 中的 lat 和 lon 为双倍时,从 NetCDF 文件中提取“tos”数据
- html - 如何检测用户是否与表单进行了交互?
- android-studio - 你能用可移动设备编辑项目吗|| 安卓工作室
- python-3.x - 在线培训如何在 Word2vec 模型中使用 Genism
- java - 当我尝试在通过 https 公开的 WSDL 上使用 wsimport 创建文件时出错
- spring-boot - Spring boot 不调用 DisposableBean.destroy() 方法
- python - Python Selenium 无法从 iframe 获取按钮
- selenium - Selenium 无法识别 Chrome 二进制文件,Chrome 已崩溃
- r - 对分组变量使用 perm.t.test 进行多重比较