首页 > 解决方案 > R. n Previous Elements in a data frame of another column

问题描述

I would like to add a new column that contains the vector (or list) of the previous n elements from another column. The computation is done after grouping.

Here is an example with n=2. Input:

v0 = c(rep("a",5),rep("b",5))
v1 = 1:10
DF1 <- data.frame(v0,v1)

> DF1
   v0 v1
1   a  1
2   a  2
3   a  3
4   a  4
5   a  5
6   b  6
7   b  7
8   b  8
9   b  9
10  b 10

Output: The new column should be a vector of integer (or a list) and contain the following values:

> DF2_L
   v0 v1    myL
1   a  1 NA, NA
2   a  2  1, NA
3   a  3   2, 1
4   a  4   3, 2
5   a  5   4, 3
6   b  6   5, 4
7   b  7   6, 5
8   b  8   7, 6
9   b  9   8, 7
10  b 10   9, 8

A simple solution would be

DF2 <- DF1 %>% group_by(v0) %>% 
  mutate(i1=lag(v1,1), i2=lag(v1,2), 
                      myL = mapply(c, i1, i2, SIMPLIFY = F))%>%
  select(-c(i1,i2))

BUT This is only a simplified table. For my computations n is 36. It means I need to create 36 new "dummy" column for lag(v1,1), lag(v1,2) ... lag(v1,36) and delete them combining the values into a list. This is not convenient. It has to be another way.

I though of using a rollapply. With F = list but I get the error message

t <- DF1 %>% group_by(v0) %>% 
  mutate( myL= rollapply(lag(v1),
                         2, fill=NA, align="right",
                         list))

Error: Problem with `mutate()` input `myL`.
x “x” : attempt to define invalid zoo object
i Input `myL` is `rollapply(lag(v1), 2, fill = NA, align = "right", list)`.
i The error occurred in group 1: v0 = "a".
Run `rlang::last_error()` to see where the error occurred.

when I use FUN = c. I get each element as a seperate column

t <- DF1 %>% group_by(v0) %>% 
  mutate( myL= rollapply(lag(v1),
                         2, fill=NA, align="right",
                         c))

   v0       v1 myL[,1]  [,2]
   <fct> <int>   <int> <int>
 1 a         1      NA    NA
 2 a         2      NA     1
 3 a         3       1     2
 4 a         4       2     3
 5 a         5       3     4
 6 b         6      NA    NA
 7 b         7      NA     6
 8 b         8       6     7
 9 b         9       7     8
10 b        10       8     9

标签: rdataframeanalyticslagrolling-computation

解决方案


Would this correspond to what you want?

v0 = c(rep("a",5),rep("b",5))
v1 = 1:10
DF1 <- data.frame(v0,v1)

n <- 2

bind_cols(DF1,
          map_dfc(1:n,
                  ~ lag(DF1$v1, .x))) %>%  #simple loop creating lagged columns
  group_by(v0, v1) %>%   # the variables we don't want to include
  nest() %>%
  mutate(my_list = list(set_names(unlist(data), NULL))) #make each tibble into an unnamed vector
# A tibble: 10 x 4
# Groups:   v0, v1 [10]
#   v0       v1 data             my_list  
#   <chr> <int> <list>           <list>   
# 1 a         1 <tibble [1 x 2]> <int [2]>
# 2 a         2 <tibble [1 x 2]> <int [2]>
# 3 a         3 <tibble [1 x 2]> <int [2]>
# 4 a         4 <tibble [1 x 2]> <int [2]>
# 5 a         5 <tibble [1 x 2]> <int [2]>
# 6 b         6 <tibble [1 x 2]> <int [2]>
# 7 b         7 <tibble [1 x 2]> <int [2]>
# 8 b         8 <tibble [1 x 2]> <int [2]>
# 9 b         9 <tibble [1 x 2]> <int [2]>
#10 b        10 <tibble [1 x 2]> <int [2]>

xx$my_list
# [[1]]
# [1] NA NA
# 
# [[2]]
# [1]  1 NA
# 
# [[3]]
# [1] 2 1
# 
# [[4]]
# [1] 3 2
# 
# [[5]]
# [1] 4 3
# 
# [[6]]
# [1] 5 4
# 
# [[7]]
# [1] 6 5
# 
# [[8]]
# [1] 7 6
# 
# [[9]]
# [1] 8 7
# 
# [[10]]
# [1] 9 8

EDIT: I'm actually not sure about what you want, if you just want the vector of concatenated lagged values, it's clearer to do it like that:

lagged_cols <- map_dfc(1:n,
                       ~ lag(DF1$v1, .x))

apply(lagged_cols, 1, paste, collapse=" ")
# [1] "NA NA" "1 NA"  "2 1"   "3 2"   "4 3"   "5 4"   "6 5"   "7 6"   "8 7"  
# [10] "9 8"  

推荐阅读