r - 连续点的读取和计数
问题描述
我在从 data.table 读取二维空间的坐标时遇到问题,如下所示并从中读出不同的质量:
DT <- data.table(
A = c(rep("aa",2),rep("bb",2)),
B = c(rep("H",2),rep("Na",2)),
Low = c(0,3,1,1),
High = c(8,10,9,8),
Time =c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0")
)
“时间”和“强度”列指的是 2D 空间的 x 和 y 值。“低”和“高”列指的是 x 轴上的边界(“时间”)。现在我想检查 (< >) 这些边界内 y (“强度”)维度的不同质量:
- 最高连续点数 > 0: (row1: 1, row 2: 2,..)
- 总点数 > 0: (row1: 1, row2: 3,..)
- 连续点的最高数量 > 基线(基线值应取自低或高边界的强度值,以较低者为准(因此对于第 3 行,它将是 12,对于其他 0)):(第 3 行:4 ,对于所有其他行,它与 1 中的相同。)
所以输出应该是这样的表:
DT <- data.table(
A =c(rep("aa",2),rep("bb",2)),
B =c(rep("H",2),rep("Na",2)),
Low = c(0,3,1,1),
High = c(8,10,9,8),
Time = c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0"),
First = c(1,2,7,0),
Second= c(1,3,7,0),
Third = c(1,2,4,0)
)
有谁知道如何处理该任务?到目前为止,我一直在尝试使用 data.table,但如果有人知道更好的包来完成此类任务,我也会很高兴。
非常感谢您!
亚瑟尔
解决方案
这是一种方法base R
。我们split
将 'Intensity', 'Time' 列按,
到 alist
中,然后循环遍历list
'High', 'Low' 列的元素,然后根据 'Intensity' 中的索引提取 'Intensity' 中的值Low' 到 'High',检查是否大于 0(也基于条件检查 'Low' 中的值)。用于rle
查找length
大于 0 的连续元素(或“低”索引)。用原始数据集创建一个data.frame
,rbind
内容list
cbind
newCols <- do.call(rbind, Map(function(u, v, x, y) {
u1 <- as.numeric(u)
v1 <- as.numeric(v)
v2 <- as.numeric(v1[u1 >x & u1 < y])
i1 <- with(rle(v2 > 0), pmax(max(lengths[values]), 0))
i2 <- sum(v2 > 0)
lb <- match(x, u1)
ub <- match(y, u1)
v3 <- as.numeric(v[(lb+1):(ub-1)])
i3 = with(rle(v3 > min(as.numeric(v[c(lb, ub)]))),
pmax(max(lengths[values]), 0))
data.frame(First = i1, Second = i2, Third = i3)
},
strsplit(DT$Time, ","), strsplit(DT$Intensity, ","), DT$Low, DT$High))
cbind(DT, newCols)
# A B Low High Time Intensity First Second Third
#1: aa H 0 8 0,1,2,3,4,5,6,7,8,9,10 0,0,0,0,561464,0,0,0,0,0,0 1 1 1
#2: aa H 3 10 0,1,2,3,4,5,6,7,8,9,10 0,0,0,6548,5464,5616,0,0,0,68716,0 2 3 2
#3: bb Na 1 9 0,1,2,3,4,5,6,7,8,9,10 5658,12,6548,6541,8,5646854,54565,56465,546,65,0 7 7 4
#4: bb Na 1 8 0,1,2,3,4,5,6,7,8,9,10 0,561464,0,0,0,0,0,0,0,0,0 0 0 0
推荐阅读
- google-bigquery - 有没有办法在 Big-query 中过滤掉我的项目中的两个特定名称?
- php - Laravel 页面自动恢复数据,无需刷新
- mysql - SQL:如何从多个表中获取计数到一个查询中?
- c# - Radzen Blazor 对话框未关闭
- laravel - Laravel 8“在此服务器上找不到请求的资源 /dashboard。”
- xamarin - 如何将数据从 TimePicker 和 Editor 传递到 Xamarin Form 中的标签?
- python - Python奇异值分解不匹配顺序和符号
- excel - 使用表格标题作为单元格中的内容
- c# - MVC 显示集合:InvalidCastException:无法将“Models.ConversionRate”类型的对象转换为“System.Collections.IEnumerable”类型
- java - 将字符串拆分为具有动态长度的不同部分