r - R时间间隔:当样本超过小时标记时按一天中的小时分组
问题描述
我有两种鸟类行为持续时间的视频数据,当鸟在巢上和鸟离开巢时。就我的分析而言,我需要每小时计算出入巢和出巢的持续时间。但是,通常不同的行为会与小时标记重叠。例如,鸟儿在 4:10-4:42 和 4:50 - 5:20 在巢上,我需要将第二个时段分开为 4:50-5:00 和 5:00-5:20,以便我可以每小时计算。我已经用 lubridate 包寻找了相当长的一段时间,但没有看到这样做的方法,但认为那里一定有什么东西。有什么建议么?
示例数据如下。“off.time.diff”是“off.bout.id”之间的秒差,与“on.time.diff”相同。在这里举个例子,这只鸟从 17:25:39 到 18:03:29 开始工作。我可以得到总时间(2270 秒),但不知道如何将其每小时分开。
Event DT.event off.bout.ID on.bout.ID off.time.diff on.time.diff
off 4/27/12 17:25:13 1 0 NA NA
on 4/27/12 17:25:39 1 1 26 NA
off 4/27/12 18:03:29 2 1 NA 2270
on 4/27/12 18:03:57 2 2 28 NA
off 4/27/12 19:41:16 3 2 NA 5839
on 4/27/12 19:43:50 3 3 154 NA
off 4/28/12 6:23:57 4 3 NA 38407
on 4/28/12 6:32:13 4 4 496 NA
off 4/28/12 6:40:20 5 4 NA 487
on 4/28/12 6:40:48 5 5 28 NA
off 4/28/12 8:16:07 6 5 NA 5719
解决方案
我的建议背后的想法是检查每个事件有多少整小时标记已通过,并为每个小时插入一个额外的行并相应地更改时间......
加载示例数据:
df <- read.table(text='Event DT.event off.bout.ID on.bout.ID off.time.diff on.time.diff
off 4/27/12-17:25:13 1 0 NA NA
on 4/27/12-17:25:39 1 1 26 NA
off 4/27/12-18:03:29 2 1 NA 2270
on 4/27/12-18:03:57 2 2 28 NA
off 4/27/12-19:41:16 3 2 NA 5839
on 4/27/12-19:43:50 3 3 154 NA
off 4/28/12-6:23:57 4 3 NA 38407
on 4/28/12-6:32:13 4 4 496 NA
off 4/28/12-6:40:20 5 4 NA 487
on 4/28/12-6:40:48 5 5 28 NA
off 4/28/12-8:16:07 6 5 NA 5719', header=T, stringsAsFactors=F)
设置日期时间变量。必要时调整tz
参数:
df$DT.event <- as.POSIXct(df$DT.event, format = "%m/%d/%y-%H:%M:%S")
library(dplyr)
library(tidyr)
# reshape data
#
df2 <- df %>%
select(Event, DT.event, on.bout.ID) %>%
pivot_wider(names_from = Event,
values_from = DT.event) %>%
select(on.bout.ID, on, off)
df2
是df
更广泛形式的一些信息:
on.bout.ID on off <int> <dttm> <dttm> 1 0 NA 2012-04-27 17:25:13 2 1 2012-04-27 17:25:39 2012-04-27 18:03:29 3 2 2012-04-27 18:03:57 2012-04-27 19:41:16 4 3 2012-04-27 19:43:50 2012-04-28 06:23:57 5 4 2012-04-28 06:32:13 2012-04-28 06:40:20 6 5 2012-04-28 06:40:48 2012-04-28 08:16:07
# Make a copy so we don't mutate the object we are using to iterate
#
df3 <- df2
for (i in seq_along(df2$on.bout.ID)) {
# extract current iterations start and end time
#
id <- df2$on.bout.ID[i]
from <- df2$on[i]
to <- df2$off[i]
# calculate number of rows to insert
#
hoursDiff <- as.numeric(format(to, "%H")) - as.numeric(format(from , "%H"))
# compensate for crossing of midnight (00:00AM)
# by adding 24
#
hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff
# if there is at least on pass of the full hour, insert a copy of the
# current row but adapt on and off times
#
if (!is.na(hoursDiff) & hoursDiff > 0) {
for (hour in 1:hoursDiff) {
# startime of this additional row
#
fromTime <- as.POSIXct(paste0(format(from + 3600 * hour, "%m/%d/%y-%H"), ":00:00"), format="%m/%d/%y-%H:%M:%S")
# Maximal endtime of this additional row
#
toTime <- fromTime + 3599
# copy current line
#
insert <- df2[i, ]
# set start time for this new row to full hour
#
insert$on <- fromTime
# if this is the last row to insert do NOT adapt off time
#
if (!(toTime > to)) {
insert$off <- toTime
}
# add additional row
#
df3 <- rbind(df3, insert)
}
# set off-time for the current line to end of first hour
#
df3[df3$on.bout.ID == id & df3$on == from & df3$off == to,]$off <- as.POSIXct(paste0(format(from, "%m/%d/%y-%H"), ":59:59"), format="%m/%d/%y-%H:%M:%S")
}
}
# Use `dplyr` to sort result
#
library(dplyr)
df3 %>% arrange(on.bout.ID, on)
# A tibble: 21 x 3 on.bout.ID on off <int> <dttm> <dttm> 1 0 NA 2012-04-27 17:25:13 2 1 2012-04-27 17:25:39 2012-04-27 17:59:59 3 1 2012-04-27 18:00:00 2012-04-27 18:03:29 4 2 2012-04-27 18:03:57 2012-04-27 18:59:59 5 2 2012-04-27 19:00:00 2012-04-27 19:41:16 6 3 2012-04-27 19:43:50 2012-04-27 19:59:59 7 3 2012-04-27 20:00:00 2012-04-27 20:59:59 8 3 2012-04-27 21:00:00 2012-04-27 21:59:59 9 3 2012-04-27 22:00:00 2012-04-27 22:59:59 10 3 2012-04-27 23:00:00 2012-04-27 23:59:59 # … with 11 more rows
漂亮吗?不!它有效吗?我认同
编辑:
添加
hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff
扩展午夜穿越的功能
推荐阅读
- sql - 将 2 列,同一个表链接到另一个表
- entity-framework-core - EF Core 自引用多对多双方
- python - 如何在 Python 中为多个变量快速添加 1?
- elasticsearch - 将日志 prestashop 发送到 kibana
- .net-core - 我可以做些什么来加快我使用 NBomber 的负载测试?(VS LT 250 RPS 轻松;NBomber 最高 25 RPS)
- python - 熊猫:将元素与另一个数据框进行比较
- hadoop - Hive/Impala 写入 HDFS
- bash - 将命令历史记录到 docker 主机
- shell - 使用 find 查找名称中包含不可打印字符的文件
- javascript - 为快速路由器创建装饰器?