首页 > 解决方案 > 在 dplyr 中每 x 行创建间隔变量

问题描述

创建每 x 行后增加 x 个单位的新变量的最有效方法是什么?例如,我有一个数据框:

 d <- data.frame(group_var = c('a', 'b', 'c'),
             y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21))

我想创建一个以 x 开头并每 x 行增加 x 的新变量,所以我会得到一个像这样的数据框:

 d <- data.frame(group_var = c('a', 'b', 'c'),
 y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
       1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,      
       1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21),
 z = c(5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20,5,5,5,5,5,10,10,10,10,10,15,15,15,15,15,20,20,20,20,20,20))

此外,当我有余数时,就像前一个数据帧一样,我希望那些与前一组排序(所以 y=11 将是 z=10)。请注意,我的目标数据框保留了与原始数据框相同的行数。

标签: rdplyrgroupingsequence

解决方案


我们可以基于diff“y”创建一个分组变量,然后创建“z”gl并乘以 5

library(dplyr)
library(tidyr)
d1 <- d %>% 
    group_by(grp = cumsum(c(TRUE, diff(y) < 0))) %>% 
    mutate(z = as.integer(gl(n(), 5, n())) * 5,
          z = replace(z,  ave(z, z, FUN = length) < 5, NA)) %>% 
    ungroup %>% 
    fill(z) %>%
    select(-grp)

-输出

as.data.frame(d1)
   group_var  y  z
1          a  1  5
2          b  2  5
3          c  3  5
4          a  4  5
5          b  5  5
6          c  6 10
7          a  7 10
8          b  8 10
9          c  9 10
10         a 10 10
11         b 11 15
12         c 12 15
13         a 13 15
14         b 14 15
15         c 15 15
16         a 16 20
17         b 17 20
18         c 18 20
19         a 19 20
20         b 20 20
21         c 21 20
22         a  1  5
23         b  2  5
24         c  3  5
25         a  4  5
26         b  5  5
27         c  6 10
28         a  7 10
29         b  8 10
30         c  9 10
31         a 10 10
32         b 11 15
33         c 12 15
34         a 13 15
35         b 14 15
36         c 15 15
37         a 16 20
38         b 17 20
39         c 18 20
40         a 19 20
41         b 20 20
42         c 21 20
43         a  1  5
44         b  2  5
45         c  3  5
46         a  4  5
47         b  5  5
48         c  6 10
49         a  7 10
50         b  8 10
51         c  9 10
52         a 10 10
53         b 11 15
54         c 12 15
55         a 13 15
56         b 14 15
57         c 15 15
58         a 16 20
59         b 17 20
60         c 18 20
61         a 19 20
62         b 20 20
63         c 21 20

推荐阅读