首页 > 解决方案 > R时间间隔:当样本超过小时标记时按一天中的小时分组

问题描述

我有两种鸟类行为持续时间的视频数据,当鸟在巢上和鸟离开巢时。就我的分析而言,我需要每小时计算出入巢和出巢的持续时间。但是,通常不同的行为会与小时标记重叠。例如,鸟儿在 4:10-4:42 和 4:50 - 5:20 在巢上,我需要将第二个时段分开为 4:50-5:00 和 5:00-5:20,以便我可以每小时计算。我已经用 lubridate 包寻找了相当长的一段时间,但没有看到这样做的方法,但认为那里一定有什么东西。有什么建议么?

示例数据如下。“off.time.diff”是“off.bout.id”之间的秒差,与“on.time.diff”相同。在这里举个例子,这只鸟从 17:25:39 到 18:03:29 开始工作。我可以得到总时间(2270 秒),但不知道如何将其每小时分开。

Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12 17:25:13    1          0           NA               NA
on      4/27/12 17:25:39    1          1           26               NA
off     4/27/12 18:03:29    2          1           NA              2270
on      4/27/12 18:03:57    2          2           28               NA
off     4/27/12 19:41:16    3          2           NA              5839
on      4/27/12 19:43:50    3          3           154              NA
off     4/28/12 6:23:57     4          3           NA              38407
on      4/28/12 6:32:13     4          4           496              NA
off     4/28/12 6:40:20     5          4           NA              487
on      4/28/12 6:40:48     5          5           28               NA
off     4/28/12 8:16:07     6          5           NA              5719

标签: rdatetimetimeintervalslubridate

解决方案


我的建议背后的想法是检查每个事件有多少整小时标记已通过,并为每个小时插入一个额外的行并相应地更改时间......

加载示例数据:

df <- read.table(text='Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12-17:25:13    1          0           NA               NA
on      4/27/12-17:25:39    1          1           26               NA
off     4/27/12-18:03:29    2          1           NA              2270
on      4/27/12-18:03:57    2          2           28               NA
off     4/27/12-19:41:16    3          2           NA              5839
on      4/27/12-19:43:50    3          3           154              NA
off     4/28/12-6:23:57     4          3           NA              38407
on      4/28/12-6:32:13     4          4           496              NA
off     4/28/12-6:40:20     5          4           NA              487
on      4/28/12-6:40:48     5          5           28               NA
off     4/28/12-8:16:07     6          5           NA              5719', header=T, stringsAsFactors=F)

设置日期时间变量。必要时调整tz参数:

df$DT.event <- as.POSIXct(df$DT.event, format = "%m/%d/%y-%H:%M:%S")


library(dplyr)
library(tidyr)

# reshape data
# 
df2 <- df %>%
  select(Event, DT.event, on.bout.ID) %>% 
  pivot_wider(names_from = Event,
              values_from = DT.event) %>% 
  select(on.bout.ID, on, off)

df2df更广泛形式的一些信息:

  on.bout.ID on                  off                
       <int> <dttm>              <dttm>             
1          0 NA                  2012-04-27 17:25:13
2          1 2012-04-27 17:25:39 2012-04-27 18:03:29
3          2 2012-04-27 18:03:57 2012-04-27 19:41:16
4          3 2012-04-27 19:43:50 2012-04-28 06:23:57
5          4 2012-04-28 06:32:13 2012-04-28 06:40:20
6          5 2012-04-28 06:40:48 2012-04-28 08:16:07
# Make a copy so we don't mutate the object we are using to iterate
#
df3 <- df2

for (i in seq_along(df2$on.bout.ID)) {

  # extract current iterations start and end time
  # 
  id <- df2$on.bout.ID[i]
  from <- df2$on[i]
  to <- df2$off[i]

  # calculate number of rows to insert
  # 
  hoursDiff <- as.numeric(format(to, "%H")) - as.numeric(format(from , "%H"))

  # compensate for crossing of midnight (00:00AM)
  # by adding 24
  #
  hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

  # if there is at least on pass of the full hour, insert a copy of the
  # current row but adapt on and off times
  # 
  if (!is.na(hoursDiff) & hoursDiff > 0) {
    for (hour in 1:hoursDiff) {

      # startime of this additional row
      # 
      fromTime <- as.POSIXct(paste0(format(from  + 3600 * hour, "%m/%d/%y-%H"), ":00:00"), format="%m/%d/%y-%H:%M:%S")

      # Maximal endtime of this additional row
      # 
      toTime <- fromTime + 3599

      # copy current line
      # 
      insert <- df2[i, ]

      # set start time for this new row to full hour
      #
      insert$on <- fromTime

      # if this is the last row to insert do NOT adapt off time
      # 
      if (!(toTime > to)) {
        insert$off <- toTime
      } 

      # add additional row
      # 
      df3 <- rbind(df3, insert)
    }

  # set off-time for the current line to end of first hour
  # 
  df3[df3$on.bout.ID == id & df3$on == from & df3$off == to,]$off <-  as.POSIXct(paste0(format(from, "%m/%d/%y-%H"), ":59:59"), format="%m/%d/%y-%H:%M:%S")
  }
}

# Use `dplyr` to sort result
#
library(dplyr)    
df3 %>% arrange(on.bout.ID, on)
    # A tibble: 21 x 3
      on.bout.ID on                  off                
           <int> <dttm>              <dttm>             
    1          0 NA                  2012-04-27 17:25:13
    2          1 2012-04-27 17:25:39 2012-04-27 17:59:59
    3          1 2012-04-27 18:00:00 2012-04-27 18:03:29
    4          2 2012-04-27 18:03:57 2012-04-27 18:59:59
    5          2 2012-04-27 19:00:00 2012-04-27 19:41:16
    6          3 2012-04-27 19:43:50 2012-04-27 19:59:59
    7          3 2012-04-27 20:00:00 2012-04-27 20:59:59
    8          3 2012-04-27 21:00:00 2012-04-27 21:59:59
    9          3 2012-04-27 22:00:00 2012-04-27 22:59:59
    10          3 2012-04-27 23:00:00 2012-04-27 23:59:59
    # … with 11 more rows

漂亮吗?不!它有效吗?我认同

编辑:

添加

 hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

扩展午夜穿越的功能


推荐阅读