首页 > 解决方案 > 在 R 中满足条件时按组 AND 顺序排序

问题描述

这是我的数据框:

    df <- data.frame(id=c("124", "124", "124", "456", "456", "456", "456", "8675", "8675", "8675", "8675", "8675", "124", "124", "124", "124"), 
            condition=c("beg", "mid", "end", "beg", "mid", "mid", "end", "beg", "mid", "mid", "mid", "end", "beg", "mid", "mid", "end"),
            school=c("a", "b", "c", "d", "e", "e", "f", "g", "h", "h", "h", "u", "j", "k", "k", "l"),
            start_date=c("20000105", "20000601", "20000901", "20000105", "20000601", "20000620", "20000901", "19990805", "20000105", "20000601", "20000901", "20010115", "20060105", "20060701", "20061001", "20070110"),
            end_date=c("20000501", "20000801", "20001215", "20000501", "20000801", "20001210", "20001215", "19991213", "20000501", "20000801", "20001215", "20010515", "20060501", "20060915", "20061215", "20070510"))

我看到的许多问题在一个组中都有顺序:

df_edited <- df  %>% 
         group_by(id, idx = cumsum(seq == 1L)) %>% 
         mutate(counter = row_number()) %>% 
         ungroup %>% 
         select(-idx)

或在条件成立后从 1 重新开始。

df_edited$num <- ave(df_edited$id, df_edited$condition, FUN = seq_along)

我已将它们标记为收藏夹,但它们不适用于我现在想做的事情。我想要的是组号相同id并在之后更改df$condition=="end"

id      condition   school  start_date  end_date    group
124     beg         a       20000105    20000501    1
124     mid         b       20000601    20000801    1
124     end         c       20000901    20001215    1
456     beg         d       20000105    20000501    2
456     mid         e       20000601    20000801    2
456     mid         e       20000620    20001210    2
456     end         f       20000901    20001215    2
8675    beg         g       19990805    19991213    3
8675    mid         h       20000105    20000501    3
8675    mid         h       20000601    20000801    3
8675    mid         h       20000901    20001215    3
8675    end         h       20010115    20010515    3
124     beg         j       20060105    20060501    4
124     mid         k       20060701    20060915    4
124     mid         k       20061001    20061215    4
124     end         l       20070110    20070510    4

有人可以帮忙吗?谢谢!

每个 ID 可以多次经历 beg,mid,end,但即使 ID 相同,我仍然希望组号不同。

标签: rgrouping

解决方案


如果我们需要一个组索引,那么使用rleid

library(dplyr)
library(data.table)
df %>%
   mutate(group = rleid(id))
#      id condition school start_date end_date group
#1   124       beg      a   20000105 20000501     1
#2   124       mid      b   20000601 20000801     1
#3   124       end      c   20000901 20001215     1
#4   456       beg      d   20000105 20000501     2
#5   456       mid      e   20000601 20000801     2
#6   456       mid      e   20000620 20001210     2
#7   456       end      f   20000901 20001215     2
#8  8675       beg      g   19990805 19991213     3
#9  8675       mid      h   20000105 20000501     3
#10 8675       mid      h   20000601 20000801     3
#11 8675       mid      h   20000901 20001215     3
#12 8675       end      u   20010115 20010515     3
#13  124       beg      j   20060105 20060501     4
#14  124       mid      k   20060701 20060915     4
#15  124       mid      k   20061001 20061215     4
#16  124       end      l   20070110 20070510     4

或在base R

df$group <-   with(rle(df$id), rep(seq_along(values), lengths))

推荐阅读