首页 > 解决方案 > Determine if next number in a time series is the max of time series so far (for grouped df)

问题描述

I am looking at time series data and trying to identify historical maximums.

I am trying to do this by iterating over a vector and checking if the value I am looking at is greater than or equal to the max of the data up to this point. I can write a function for this, but I am struggling when I want to apply it to a grouped data frame.

Here is an example:

set.seed(32)
x <- data.frame(time = c(1:6), 
                value = runif(6))
> x
  time     value
1    1 0.5058405
2    2 0.5948084
3    3 0.8087471
4    4 0.7288197
5    5 0.1519876
6    6 0.9561873

#write a function to identify the records
#function takes an index 
#checks whether the number at that index is greater than or equal to the maximum of the preceding values to that index
max_v <- function(index) {
  output <- x$value[index] >= max(x$value[1:index])
  output
}

#create the record variable
x$record <- sapply(1:nrow(x), max_v)
 > x
  time     value record
1    1 0.5058405   TRUE
2    2 0.5948084   TRUE
3    3 0.8087471   TRUE
4    4 0.7288197  FALSE
5    5 0.1519876  FALSE
6    6 0.9561873   TRUE

The function works well. However the challenge I am facing is that I want to apply this to a data frame grouped by the type variable created below:

set.seed(32)
x <- data.frame(time = rep(c(1:6),2), 
                type = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                value = runif(12))
> x
   time type     value
1     1    1 0.5058405
2     2    1 0.5948084
3     3    1 0.8087471
4     4    1 0.7288197
5     5    1 0.1519876
6     6    1 0.9561873
7     1    2 0.7535377
8     2    2 0.8520623
9     3    2 0.6734418
10    4    2 0.3871255
11    5    2 0.6580025
12    6    2 0.3213696

What I want is:

> x
   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

I have tried group_map and tapply, but I can't seem to get intelligible results, as I don't know how to pass the vector of indexes that I want to apply/map over.

标签: r

解决方案


You can compare grouped value against the cumulative max.

x$record <- as.logical(with(x, ave(value, type, FUN = \(v) v == cummax(v))))
x

   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

推荐阅读