首页 > 解决方案 > 如何计算 R 中不同字符串变量(在列中)之间的时间重叠?

问题描述

我想创建一个新列,每次问题(A/B)和凝视(离开)之间的时间重叠时,给出一个 0/​​1(或 FALSE/TRUE)值。

这是数据的一个虚拟示例,其中时间值以毫秒为单位:

Begin End Duration Question Gaze  File
1145  1245 100       A            xx_xx_A_xx
1050  1100 50        B            xx_xx_A_xx
1200  1300 100              away  xx_xx_A_xx
1100  1200 100       A            xx_xx_B_xx
1200  1245  45       B            xx_xx_B_xx
1300  1400 100              away  xx_xx_B_xx

我想获得一个新列 Q_Gazeoverlap:

Begin End Duration Question Gaze  File        Q_Gazeoverlap
1145  1245 100       A            xx_xx_A_xx  
1050  1100 50        B            xx_xx_A_xx
1200  1300 100              away  xx_xx_A_xx   1
1300  1400 100       A            xx_xx_B_xx
1200  1245  45       B            xx_xx_B_xx
1300  1400 100              away  xx_xx_B_xx   0

到目前为止我提出的代码是:

data$Q_Gazeoverlap <- 0

for(i in 1:nrow(data)){ 
  if(data$Question[i] == "A" | data$Question[i] == "B"){ 
    QuesBegin <- data$Begin[i]
    QuesEnd <- data$End[i] 
    
    for(j in 1:nrow(data)){
      if(data$Gaze[j] == "away"){ 
      GazeBegin <- data$Begin[j]
      GazeEnd <- data$End[j]
    
  
      if (data$File[i] == data$File[j] & data$Question[i] == strsplit(data$File,"_")[[i]][3]){ # check if its the right file, and checks if Question is same as A/B in File by looking at 3rd part of string
          ifelse((GazeBegin < QuesBegin | GazeBegin == QuesBegin) & (GazeEnd < QuesEnd | GazeEnd == QuesEnd) |
           (GazeEnd > QuesEnd | GazeEnd == QuesEnd) |
           (GazeBegin > QuesBegin | GazeBegin == QuesBegin) & (GazeBegin < QuesEnd | GazeBegin == QuesEnd),
         data$Q_Gazeoverlap[j] <- 1, 
         data$Q_Gazeoverlap[j] <- "0" )
      }}}}}

但不知何故,我不断为每个“离开”注释获取“1”值,而不是仅针对与 A/B 重叠的注释,具体取决于它正在查看的文件。

有没有人有计算重叠的经验?非常感激!

标签: rstringoverlap

解决方案


这是一种data.table方法..

使用的样本数据

library( data.table)
DT <- data.table::fread("Begin, End, Duration, Question, Gaze,  File
1145,1245,100,A,,xx_xx_A_xx
1050,1100,50,B,,xx_xx_A_xx
1200,1300,100,,away,xx_xx_A_xx
1100,1200,100,A,,xx_xx_B_xx
1200,1245,45,B,,xx_xx_B_xx
1300,1400,100,,away,xx_xx_B_xx")

代码

这可以在(相当长的)data.table-oneliner 中完成。

!Gaze == ""计算与其重叠的行数的每一行!Question == ""。如果大于零,则返回 1,否则返回 0。

DT[ !Gaze == "", Q_Gazeoverlap := as.numeric( 
  DT[ !Question == "", ][ DT[ !Gaze == "", ], 
                          on = .(Begin <= End, End >= Begin),
                          .N,
                          by = .EACHI, 
                          allow.cartesian = TRUE ]$N > 0 ) ][]


#    Begin  End Duration Question Gaze       File Q_Gazeoverlap
# 1:  1145 1245      100        A      xx_xx_A_xx            NA
# 2:  1050 1100       50        B      xx_xx_A_xx            NA
# 3:  1200 1300      100          away xx_xx_A_xx             1
# 4:  1100 1200      100        A      xx_xx_B_xx            NA
# 5:  1200 1245       45        B      xx_xx_B_xx            NA
# 6:  1300 1400      100          away xx_xx_B_xx             0

如果您还想在 File 上合并,只需将 File-field 添加到on-argument,如下所示:

DT[ !Gaze == "", Q_Gazeoverlap := as.numeric( 
      DT[ !Question == "", ][ DT[ !Gaze == "", ], 
                              on = .(Begin <= End, End >= Begin, File),
                              .N,
                              by = .EACHI, 
                              allow.cartesian = TRUE ]$N > 0 ) ]

推荐阅读