首页 > 解决方案 > 根据其他数据帧中指定的不同时间范围对数据帧进行子集

问题描述

我这里有两个数据框。

SCORE_df <- data.frame("Participant" = rep(1:2, times=1, each=7), "Score" = as.integer(runif(14, 0, 100)), "Time" = c('17:00:00', '17:00:01', '17:00:02', '17:00:03', '17:00:04', '17:00:05', '17:00:06', '19:50:30', '19:50:31', '19:50:32', '19:50:33', '19:50:34', '19:50:35', '19:50:36'))
              
TIME_df <- data.frame("Participant" = c(1,2), "Start" = c('17:00:02', '19:50:31'), "End" = c('17:00:05', '19:50:33'))

> SCORE_df
   Participant Score     Time
1            1    56 17:00:00
2            1    77 17:00:01
3            1    27 17:00:02
4            1    78 17:00:03
5            1    46 17:00:04
6            1    22 17:00:05
7            1    35 17:00:06
8            2    26 19:50:30
9            2    64 19:50:31
10           2    29 19:50:32
11           2    29 19:50:33
12           2    90 19:50:34
13           2     0 19:50:35
14           2    51 19:50:36

> TIME_df
  Participant    Start      End
1           1 17:00:02 17:00:05
2           2 19:50:31 19:50:33

我想使用 TIME_df 中的开始和结束信息来子集 SCORE_df 数据——即,只保留“每个”参与者的“开始到结束”时间(包括)“内”的分数数据。

下面的行做了错误的子集。

for(p in 1:nrow(SCORE_df)){
  ppt <- SCORE_df$Participant[p]
  SCORE_df_trimmed <- with(SCORE_df[which(SCORE_df$Participant==ppt),], SCORE_df[strptime(Time, "%H:%M:%S") >= strptime(TIME_df$Start[TIME_df$Participant==ppt], "%H:%M:%S") & strptime(Time, "%H:%M:%S") <= strptime(TIME_df$End[TIME_df$Participant==ppt], "%H:%M:%S"),])
}

> SCORE_df_trimmed
   Participant Score     Time
2            1    77 17:00:01
3            1    27 17:00:02
4            1    78 17:00:03
9            2    64 19:50:31
10           2    29 19:50:32
11           2    29 19:50:33

如果有人能查明上述行中的错误,我将不胜感激。

标签: rdataframetimesubset

解决方案


一个选项data.table

library(data.table)
setDT(SCORE_df)
setDT(TIME_df)

# POSIXct works with a date reference, so I add in just any date
SCORE_df[, Time := as.POSIXct(paste0("2000-01-01 ", Time), tz = "UTC")]
TIME_df[,  Start := as.POSIXct(paste0("2000-01-01 ", Start), tz = "UTC")]
TIME_df[,  End := as.POSIXct(paste0("2000-01-01 ", End), tz = "UTC")]


SCORE_df[TIME_df, 
         on = .(Participant = Participant, Time >= Start, Time <= End), 
         .(Participant, Score, Time = x.Time)]


#    Participant Score                Time
# 1:           1    49 2000-01-01 17:00:02
# 2:           1    75 2000-01-01 17:00:03
# 3:           1     8 2000-01-01 17:00:04
# 4:           1     7 2000-01-01 17:00:05
# 5:           2    49 2000-01-01 19:50:31
# 6:           2    59 2000-01-01 19:50:32
# 7:           2    13 2000-01-01 19:50:33

推荐阅读