r - R. n Previous Elements in a data frame of another column
问题描述
I would like to add a new column that contains the vector (or list) of the previous n elements from another column. The computation is done after grouping.
Here is an example with n=2. Input:
v0 = c(rep("a",5),rep("b",5))
v1 = 1:10
DF1 <- data.frame(v0,v1)
> DF1
v0 v1
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 6
7 b 7
8 b 8
9 b 9
10 b 10
Output: The new column should be a vector of integer (or a list) and contain the following values:
> DF2_L
v0 v1 myL
1 a 1 NA, NA
2 a 2 1, NA
3 a 3 2, 1
4 a 4 3, 2
5 a 5 4, 3
6 b 6 5, 4
7 b 7 6, 5
8 b 8 7, 6
9 b 9 8, 7
10 b 10 9, 8
A simple solution would be
DF2 <- DF1 %>% group_by(v0) %>%
mutate(i1=lag(v1,1), i2=lag(v1,2),
myL = mapply(c, i1, i2, SIMPLIFY = F))%>%
select(-c(i1,i2))
BUT This is only a simplified table. For my computations n is 36. It means I need to create 36 new "dummy" column for lag(v1,1), lag(v1,2) ... lag(v1,36) and delete them combining the values into a list. This is not convenient. It has to be another way.
I though of using a rollapply. With F = list but I get the error message
t <- DF1 %>% group_by(v0) %>%
mutate( myL= rollapply(lag(v1),
2, fill=NA, align="right",
list))
Error: Problem with `mutate()` input `myL`.
x “x” : attempt to define invalid zoo object
i Input `myL` is `rollapply(lag(v1), 2, fill = NA, align = "right", list)`.
i The error occurred in group 1: v0 = "a".
Run `rlang::last_error()` to see where the error occurred.
when I use FUN = c. I get each element as a seperate column
t <- DF1 %>% group_by(v0) %>%
mutate( myL= rollapply(lag(v1),
2, fill=NA, align="right",
c))
v0 v1 myL[,1] [,2]
<fct> <int> <int> <int>
1 a 1 NA NA
2 a 2 NA 1
3 a 3 1 2
4 a 4 2 3
5 a 5 3 4
6 b 6 NA NA
7 b 7 NA 6
8 b 8 6 7
9 b 9 7 8
10 b 10 8 9
解决方案
Would this correspond to what you want?
v0 = c(rep("a",5),rep("b",5))
v1 = 1:10
DF1 <- data.frame(v0,v1)
n <- 2
bind_cols(DF1,
map_dfc(1:n,
~ lag(DF1$v1, .x))) %>% #simple loop creating lagged columns
group_by(v0, v1) %>% # the variables we don't want to include
nest() %>%
mutate(my_list = list(set_names(unlist(data), NULL))) #make each tibble into an unnamed vector
# A tibble: 10 x 4
# Groups: v0, v1 [10]
# v0 v1 data my_list
# <chr> <int> <list> <list>
# 1 a 1 <tibble [1 x 2]> <int [2]>
# 2 a 2 <tibble [1 x 2]> <int [2]>
# 3 a 3 <tibble [1 x 2]> <int [2]>
# 4 a 4 <tibble [1 x 2]> <int [2]>
# 5 a 5 <tibble [1 x 2]> <int [2]>
# 6 b 6 <tibble [1 x 2]> <int [2]>
# 7 b 7 <tibble [1 x 2]> <int [2]>
# 8 b 8 <tibble [1 x 2]> <int [2]>
# 9 b 9 <tibble [1 x 2]> <int [2]>
#10 b 10 <tibble [1 x 2]> <int [2]>
xx$my_list
# [[1]]
# [1] NA NA
#
# [[2]]
# [1] 1 NA
#
# [[3]]
# [1] 2 1
#
# [[4]]
# [1] 3 2
#
# [[5]]
# [1] 4 3
#
# [[6]]
# [1] 5 4
#
# [[7]]
# [1] 6 5
#
# [[8]]
# [1] 7 6
#
# [[9]]
# [1] 8 7
#
# [[10]]
# [1] 9 8
EDIT: I'm actually not sure about what you want, if you just want the vector of concatenated lagged values, it's clearer to do it like that:
lagged_cols <- map_dfc(1:n,
~ lag(DF1$v1, .x))
apply(lagged_cols, 1, paste, collapse=" ")
# [1] "NA NA" "1 NA" "2 1" "3 2" "4 3" "5 4" "6 5" "7 6" "8 7"
# [10] "9 8"