r - 如果有空白空间,则处理连续出现
问题描述
这个问题与我之前 关于在每个 id 的数据帧中识别值的出现的问题有关。
这次我试图识别每个 id 长度为 3 或更多的非连续测量。这些非 w 测量发生在 w 的连续出现之后(连续出现的长度大小至少为 3)。我不知道如何处理空格。即使我替换为na's
仍然无法正常工作。
id t1 t2 t3 t4 t5 t6 t7 t8 t9
1 w w w r s r # empty space t1:t3; 3 consecutive occ. of w and 3 non-consec. occ. after the last w at t6
2 w w e w w w w # empty space t1:t2; 4 consec. occ. of w start at t6 but no non-w occ. after the last w
3 w w w w w w s s s # no empty space; 6 consec. w occ.; 3 non-w occ. start at t7
4 w w w w w w w w # t1 empty space; 8 consec. w occ. but no non-w occ. after the last w
5 w w w w w w r s w # no empty space; consec w occ. till t6; 2 non-w occ. but not after the last occ. of w and not 3 times
6 s w r w r w w s # no empty space; 2 consec. occ. of w and 1 non-w occ. after the last w.
前任。
w
下面是长度为的连续出现的示例3
。从t1:t3
那里有空的空间;从连续出现 w 和从t4:t6
有3 个非 w 出现(无论它们是否连续)。3
t7:t9
id t1 t2 t3 t4 t5 t6 t7 t8 t9
1 w w w r s r
我想将非 w 事件保存df
为:
id t6 t7 t8 t9
1 w r s r
我不知道的是:
- 如何识别
w
长度至少为3的连续出现的最后位置
前任。我怎样才能知道是否在最后一个w
位置 - 那是t6
id t1 t2 t3 t4 t5 t6 t7 t8 t9
1 w w w r s r
- 我怎样才能知道在最后一个
w
位置之后 - 是否t6
至少3
连续non-w
出现?
前任。如何确定在最后一个w
位置之后 - 即t6
t7:t9 是否有非 w 出现。
id t1 t2 t3 t4 t5 t6 t7 t8 t9
1 w w w r s r
样本数据:
df<-structure(list(id=c(1,2,3,4,5,6), t1=c("","","w","","w","", "w"), t2=c("","","w","w","w","s", "w"),t3 = c("","w","w","w","w","w", "w"),
t4 = c("w","w","w","w","w","r", "w"), t5 = c("w","e","w","w","w","w", "r"), t6 = c("w","w","w","w","w","r", "s"),
t7 = c("r","w","s","w","r","w", "t"), t8 = c("r","w","s","w","s","w", "v"), t9=c("r","w","s","w","w","s"), "z"), row.names = c(NA, 6L), class = "data.frame")
df
输出df
:
id t6 t7 t8 t9
1 w r s r
3 w s s s
还有一种特殊情况,当 t 不是同时开始时,例如从下面最后df
一次id 7
出现 w 时开始,t4
而不是t6
在其他情况下。
id t1 t2 t3 t4 t5 t6 t7 t8 t9
1 w w w r r r
2 w w e w w w w
3 w w w w w w s s s
4 w w w w w w w w
5 w w w w w w r s w
6 s w r w r w w z
7 w w w w r s t v s
这个输出会更复杂。如果 occ.lenght 至少为 3,删除 w 的 if(consec.occ.lenght 至少 3) 并保留序列的第二部分会不会更容易?
id t4 t5 t6 t7 t8 t9
1 w r s r
3 w s s s
7 w r s t v s
解决方案
使用apply
逐行:
mat <- apply(df[-1], 1, function(x) {
#rle to find consecutive occurrence of w
a1 <- rle(x == 'w')
#Find the position of last 'w' in rle output
a2 <- max(which(a1$values))
#Find the position of last 'w' in x
a3 <- sum(a1$lengths[1:a2])
#If the consecutive occurrence of last w is greater than equal to 3 and
#If there are more than 3 values after the last w
if(a1$length[a2] >= 3 & length(x) >= a3 + 3)
#Keep only the values after the last w
x[a3:length(x)]
})
#Get length of elements in each list
n <- lengths(mat)
#Get max n meaning number of columns in final dataframe
m <- max(n)
#Append NA's to shorter elements to make the length equal
new_mat <- t(sapply(mat[n > 0], function(x) c(rep(NA, m - length(x)), x)))
#Create a new dataframe
data.frame(id = df$id[n > 0], new_mat)
数据
df <- structure(list(id = 1:7, t1 = c("", "", "w", "", "w", "", "w"
), t2 = c("", "", "w", "w", "w", "s", "w"), t3 = c("", "w", "w",
"w", "w", "w", "w"), t4 = c("w", "w", "w", "w", "w", "r", "w"
), t5 = c("w", "e", "w", "w", "w", "w", "r"), t6 = c("w", "w",
"w", "w", "w", "r", "s"), t7 = c("r", "w", "s", "w", "r", "w",
"t"), t8 = c("r", "w", "s", "w", "s", "w", "v"), t9 = c("r",
"w", "s", "w", "w", "z", "s")), class = "data.frame", row.names = c(NA,-7L))
推荐阅读
- javascript - Node.js,即使条件为假,也表示语句是否运行
- html - 在 ng-image-slider 中覆盖背景图像
- tesseract - 为什么我在通过 Tesseract 获取文本时会得到额外的字符(箭头符号)?
- python - 如何修复 Pandas DataReader 错误
- html - Angular - 如何在 bsdaterangepicker 中突出显示今天的日期
- php - 错误:无法读取小部件导入数据。请尝试其他文件
- r - 以 R (POSIXct 格式)计算两个日期之间的时间(以分钟为单位)
- android - Ionic 3 文件访问存储 android 11 问题。Play商店拒绝申请
- javascript - 努力让随机名称生成器工作
- linux - 为什么 linux sys_unlinkat 获取参数 dfd/dirfd 总是 -100?