首页 > 解决方案 > 识别具有相同元素的不同序列

问题描述

我想得到一个不受相等值影响的序列向量。

group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3 )

x = c("B","B",NA,"A","B","C","D", "A","A",NA,"A","A","A", "D","A","A","D","C","D")

dad = data.frame(group, x)

预期向量

out = c(1,1,NA,2,3,4,5, 1,1,NA,1,1,1, 1,2,2,3,4,5)

dad = cbind(dad, out)

也就是说,例如,在组1中元素"B"再次出现,但当序列发生变化时,它必须继续序列。在这种情况下NANA

标签: r

解决方案


一个选项data.table。将 'data.frame' 转换为 'data.table' ( setDT(dad)),按 'group' 分组,指定i逻辑索引以仅选择 'x' 为非 NA 的行,并获取 run-length-id ( rleid) 的 'x' 被分配为新列 'ind'

library(data.table)
setDT(dad)[!is.na(x),  ind := rleid(x), group]
dad
#    group    x ind
#1:     1    B   1
#2:     1    B   1
#3:     1 <NA>  NA
#4:     1    A   2
#5:     1    B   3
#6:     1    C   4
#7:     1    D   5
#8:     2    A   1
#9:     2    A   1
#10     2 <NA>  NA
#11:    2    A   1
#12:    2    A   1
#13:    2    A   1
#14:    3    D   1
#15:    3    A   2
#16:    3    A   2
#17:    3    D   3
#18:    3    C   4
#19:    3    D   5

推荐阅读