首页 > 解决方案 > 连续点的读取和计数

问题描述

我在从 data.table 读取二维空间的坐标时遇到问题,如下所示并从中读出不同的质量:

DT <- data.table(
                                      A = c(rep("aa",2),rep("bb",2)),
                                      B = c(rep("H",2),rep("Na",2)),
                                      Low = c(0,3,1,1),
                                      High = c(8,10,9,8),
                                      Time =c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
                                      Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0")

                     )

“时间”和“强度”列指的是 2D 空间的 x 和 y 值。“低”和“高”列指的是 x 轴上的边界(“时间”)。现在我想检查 (< >) 这些边界内 y (“强度”)维度的不同质量:

  1. 最高连续点数 > 0: (row1: 1, row 2: 2,..)
  2. 总点数 > 0: (row1: 1, row2: 3,..)
  3. 连续点的最高数量 > 基线(基线值应取自低或高边界的强度值,以较低者为准(因此对于第 3 行,它将是 12,对于其他 0)):(第 3 行:4 ,对于所有其他行,它与 1 中的相同。)

所以输出应该是这样的表:

DT <- data.table(
                              A =c(rep("aa",2),rep("bb",2)),
                              B =c(rep("H",2),rep("Na",2)),
                              Low = c(0,3,1,1),
                              High = c(8,10,9,8),
                              Time = c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
                              Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0"),
                              First = c(1,2,7,0),
                              Second= c(1,3,7,0),
                              Third = c(1,2,4,0)
                  )

有谁知道如何处理该任务?到目前为止,我一直在尝试使用 data.table,但如果有人知道更好的包来完成此类任务,我也会很高兴。

非常感谢您!

亚瑟尔

标签: rdata.table

解决方案


这是一种方法base R。我们split将 'Intensity', 'Time' 列按,到 alist中,然后循环遍历list'High', 'Low' 列的元素,然后根据 'Intensity' 中的索引提取 'Intensity' 中的值Low' 到 'High',检查是否大于 0(也基于条件检查 'Low' 中的值)。用于rle查找length大于 0 的连续元素(或“低”索引)。用原始数据集创建一个data.frame,rbind内容listcbind

newCols <- do.call(rbind, Map(function(u, v, x, y) {
     u1 <- as.numeric(u)
     v1 <- as.numeric(v)
     v2 <- as.numeric(v1[u1 >x & u1 < y])
     i1 <- with(rle(v2 > 0), pmax(max(lengths[values]), 0))
     i2 <- sum(v2 > 0)
     lb <- match(x, u1)
     ub <- match(y, u1)
     v3 <- as.numeric(v[(lb+1):(ub-1)])

     i3 = with(rle(v3 > min(as.numeric(v[c(lb, ub)]))), 
                      pmax(max(lengths[values]), 0))
      data.frame(First = i1, Second = i2, Third = i3)
      },
         strsplit(DT$Time, ","), strsplit(DT$Intensity, ","), DT$Low, DT$High))

cbind(DT, newCols)
#  A  B Low High                   Time                                        Intensity First Second Third
#1: aa  H   0    8 0,1,2,3,4,5,6,7,8,9,10                       0,0,0,0,561464,0,0,0,0,0,0     1      1     1
#2: aa  H   3   10 0,1,2,3,4,5,6,7,8,9,10               0,0,0,6548,5464,5616,0,0,0,68716,0     2      3     2
#3: bb Na   1    9 0,1,2,3,4,5,6,7,8,9,10 5658,12,6548,6541,8,5646854,54565,56465,546,65,0     7      7     4
#4: bb Na   1    8 0,1,2,3,4,5,6,7,8,9,10                       0,561464,0,0,0,0,0,0,0,0,0     0      0     0

推荐阅读