r - 在R中提取模式周围的行
问题描述
我有一个 data.frame test
,我想在其中确定每个bar
-foo
模式之前和之后的内容id
。该模式必须是连续的timestamp
例如,在以下示例中,出现了三种bar
-模式foo
。
> test
timestamp id message result
1 2019-01-01 00:00:21 1 bar negative
2 2019-01-01 00:00:58 1 bar positive
3 2019-01-01 00:01:35 1 foo positive
4 2019-01-01 00:03:02 1 bar negative
5 2019-01-01 00:06:42 1 baz positive
6 2019-01-01 00:07:16 1 baz positive
7 2019-01-01 00:07:39 1 bar positive
8 2019-01-01 00:09:14 2 bar negative
9 2019-01-01 00:09:56 2 foo negative
10 2019-01-01 00:10:56 2 foo positive
11 2019-01-01 00:11:13 2 foo negative
12 2019-01-01 00:11:32 2 foo positive
13 2019-01-01 00:11:49 2 bar negative
14 2019-01-01 00:12:18 2 foo positive
15 2019-01-01 00:15:28 2 bar positive
因此,理想的输出将如下所示:
> output
before after id
1 negative negative 1
2 <NA> positive 2
3 positive positive 2
我在下面应用的代码有效,但看起来很复杂且效率低下
test %>%
group_by(id) %>%
mutate(next.message = lead(message, order_by=timestamp),
previous.result = lag(result, order_by=timestamp),
next.result = lead(result, n = 2, order_by=timestamp)) %>%
filter(message == 'bar', next.message == 'foo') %>%
filter_all(any_vars(!is.na(.))) %>%
select (-c(timestamp, message, result, next.message)) %>%
rename(before = previous.result , after = next.result)
dplyr
使用ordata.table
函数来解决这个问题的更好方法是什么?
样本数据:
dput(test)
structure(list(timestamp = structure(c(1546318821, 1546318858,
1546318895, 1546318982, 1546319202, 1546319236, 1546319259, 1546319354,
1546319396, 1546319456, 1546319473, 1546319492, 1546319509, 1546319538,
1546319728), class = c("POSIXct", "POSIXt")), id = c(1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), message = c("bar", "bar",
"foo", "bar", "baz", "baz", "bar", "bar", "foo", "foo", "foo",
"foo", "bar", "foo", "bar"), result = c("negative", "positive",
"positive", "negative", "positive", "positive", "positive", "negative",
"negative", "positive", "negative", "positive", "negative", "positive",
"positive")), row.names = c(NA, -15L), class = "data.frame")
解决方案
也许是这样的data.table
:
library(data.table)
setDT(test)
test[,
{
#find the rows where message is bar and next message is foo
v <- .I[message=="bar" & shift(message, -1L, fill="")=="foo"]
#extract the previous result and use NA if its beyond the starting row index of current id
.(before=test[replace(v - 1L, v - 1L < min(.I), NA_integer_), result],
#extract the next result and use NA if its beyond the ending row index of current id
after=test[replace(v + 2L, v + 2L > max(.I), NA_integer_), result])
},
by=.(id)]
输出:
id before after
1: 1 negative negative
2: 2 <NA> positive
3: 2 positive positive
推荐阅读
- np - 为什么我们需要一个中间顶点来将 DHP 减少到 UHP?
- reactjs - React Typescript标签拖动到文本区域
- python-3.x - 条件计算的中间层损失计算
- http - 如何在 Clojure 中使用 POST 处理程序接收 edn?(以及如何发送)
- c# - 从另一个 UserControl 调用 UserControl 中的方法
- matlab - Matlab 功能块中的错误:索引超出数组维度
- elasticsearch - 将防火墙日志从 kiwi syslog 服务器转发到 elasticsearch?
- valgrind - 为 valgrind 子进程设置 argv[0]?
- java - 在 Spring Boot 中无法识别 spring.datasource.driverClassName [已解决]
- sql - 如何在 SELECT ... LIKE 查询中指定子字符串的长度