r - 如何根据列中的特定值序列在 df 中标记组
问题描述
我有一个数据框,其 id 和 value 列如下所示,但想根据 value 列中的值,按 id 组确定 Status 列。
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
serial = rep(1:10,3),
value = c(rep(1,4), rep(0,3), rep(1,3),
rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
rep(c(1,0),5)),
status = c(rep("Fluctuating", 10),
rep("Fluctuating", 10),
rep("Not fluctuating", 10)))
id serial value status
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating
这里,如果三个或更多的1后面跟着3个或更多(0或-1),再跟着3个或更多的1,则认为一个组在波动。如果三个或更多交替的 0s-1s-0s、-1s-0s-1s 等,也将被认为是波动的。
想知道分配状态列的最佳方法是什么,最好使用dplyr
?
谢谢!
解决方案
library(dplyr)
# library(zoo) # rollapply
threes <- function(z, minlen = 3L, ptn = c(TRUE, FALSE, TRUE)) {
r <- rle(z > 0)
starts <- zoo::rollapply(r$lengths >= minlen, minlen, all, fill = FALSE, align = "left")
for (st in which(starts)) {
if (all(r$values[st + seq_len(minlen) - 1L] == ptn)) return(TRUE)
}
return(FALSE)
}
x %>%
group_by(id) %>%
mutate(status2 = paste0(if (threes(value)) "" else "Not ", "Fluctuating")) %>%
ungroup() %>%
print(n = 99)
# # A tibble: 30 x 5
# id serial value status status2
# <dbl> <int> <dbl> <chr> <chr>
# 1 1 1 1 Fluctuating Fluctuating
# 2 1 2 1 Fluctuating Fluctuating
# 3 1 3 1 Fluctuating Fluctuating
# 4 1 4 1 Fluctuating Fluctuating
# 5 1 5 0 Fluctuating Fluctuating
# 6 1 6 0 Fluctuating Fluctuating
# 7 1 7 0 Fluctuating Fluctuating
# 8 1 8 1 Fluctuating Fluctuating
# 9 1 9 1 Fluctuating Fluctuating
# 10 1 10 1 Fluctuating Fluctuating
# 11 2 1 1 Fluctuating Fluctuating
# 12 2 2 1 Fluctuating Fluctuating
# 13 2 3 1 Fluctuating Fluctuating
# 14 2 4 1 Fluctuating Fluctuating
# 15 2 5 0 Fluctuating Fluctuating
# 16 2 6 -1 Fluctuating Fluctuating
# 17 2 7 -1 Fluctuating Fluctuating
# 18 2 8 1 Fluctuating Fluctuating
# 19 2 9 1 Fluctuating Fluctuating
# 20 2 10 1 Fluctuating Fluctuating
# 21 3 1 1 Not fluctuating Not Fluctuating
# 22 3 2 0 Not fluctuating Not Fluctuating
# 23 3 3 1 Not fluctuating Not Fluctuating
# 24 3 4 0 Not fluctuating Not Fluctuating
# 25 3 5 1 Not fluctuating Not Fluctuating
# 26 3 6 0 Not fluctuating Not Fluctuating
# 27 3 7 1 Not fluctuating Not Fluctuating
# 28 3 8 0 Not fluctuating Not Fluctuating
# 29 3 9 1 Not fluctuating Not Fluctuating
# 30 3 10 0 Not fluctuating Not Fluctuating
推荐阅读
- android-management-api - 设备删除实际上是如何工作的?
- javascript - 如何将唯一变量传递给 eventsListiner 函数?
- ubuntu-18.04 - 将入站流量传递/重定向到专用网络内的客户端
- loops - SAS - 循环宏
- kubernetes - 如何获取“就绪”的 pod 列表?
- python - 为循环 Python 向量化
- php - 我想遍历一个表行来获取数据,但它只重复拾取第一行,不知道错误在哪里?
- javascript - 如何使用 javascript 在 html 表格行中计算
- python - Numpy 将简单字符串与 numpy 数组的值连接起来
- python - Pytest 调用不同的 python 解释器(由于边界效应)