首页 > 解决方案 > R:根据前一行合并连续行

问题描述

我正在使用 R 处理说话时间数据集(时间序列)。该数据显示了不同说话者的进出时间。我想(1)按时间顺序组合属于同一扬声器的连续行,(2)将时间保留在In第一行的列中,并将时间保留Out在新合并行的最后一列中。有人可以指导我如何解决这个问题吗?非常感谢!

原始数据:

structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4", 
"15:52.2", "16:11.0", "16:23.4", "16:35.3", "16:36.9", "16:47.4", 
"17:06.0", "17:11.2", "17:18.7", "17:26.7", "17:34.6", "18:05.0", 
"18:17.5", "18:28.9", "18:32.4", "19:00.4", "19:41.3", "20:01.6"
), Out = c("15:27.7", "15:36.9", "15:45.2", "15:52.0", "16:10.8", 
"16:22.0", "16:35.0", "16:36.8", "16:37.8", "17:04.8", "17:08.3", 
"17:17.0", "17:23.8", "17:27.2", "18:04.3", "18:06.0", "18:24.3", 
"18:31.8", "18:59.1", "19:40.2", "19:53.1", "20:19.1"), Speaker = c("Y", 
"Y T", "Y", "T", "ATA", "Y", "T", "T", "Y", "Y T", "Y", "T", 
"Y", "Y", "Y", "Y", "A T", "T", "T", "T", "T", "A TY")), class = "data.frame", row.names = c(NA, 
-22L))

预期输出:

structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4", 
"15:52.2", "16:11.0", "16:23.4", "16:36.9", "16:47.4", "17:06.0", 
"17:11.2", "17:18.7", "18:17.5", "18:28.9", "20:01.6"), Out = c("15:27.7", 
"15:36.9", "15:45.2", "15:52.0", "16:10.8", "16:22.0", "16:36.8", 
"16:37.8", "17:04.8", "17:08.3", "17:17.0", "18:06.0", "18:24.3", 
"19:53.1", "20:19.1"), Speaker = c("Y", "Y T", "Y", "T", "ATA", 
"Y", "T", "Y", "Y T", "Y", "T", "Y", "A T", "T", "A TY")), class = "data.frame", row.names = c(NA, 
-15L))

标签: r

解决方案


我们可以使用rleidfrom data.tableie 基于“Speaker”的相似相邻值创建一个分组变量,然后summarisefirst“In”和last“Out”列的值

library(dplyr)
library(data.table)
df1 %>% 
    group_by(grp = rleid(Speaker), Speaker) %>% 
    summarise(In = first(In), Out = last(Out), .groups = 'drop') %>%
    select(names(df1))

-输出

# A tibble: 15 x 3
#   In      Out     Speaker
#   <chr>   <chr>   <chr>  
# 1 15:22.5 15:27.7 Y      
# 2 15:31.2 15:36.9 Y T    
# 3 15:38.1 15:45.2 Y      
# 4 15:45.4 15:52.0 T      
# 5 15:52.2 16:10.8 ATA    
# 6 16:11.0 16:22.0 Y      
# 7 16:23.4 16:36.8 T      
# 8 16:36.9 16:37.8 Y      
# 9 16:47.4 17:04.8 Y T    
#10 17:06.0 17:08.3 Y      
#11 17:11.2 17:17.0 T      
#12 17:18.7 18:06.0 Y      
#13 18:17.5 18:24.3 A T    
#14 18:28.9 19:53.1 T      
#15 20:01.6 20:19.1 A TY   

推荐阅读