r - 在 R 中满足条件时按组 AND 顺序排序
问题描述
这是我的数据框:
df <- data.frame(id=c("124", "124", "124", "456", "456", "456", "456", "8675", "8675", "8675", "8675", "8675", "124", "124", "124", "124"),
condition=c("beg", "mid", "end", "beg", "mid", "mid", "end", "beg", "mid", "mid", "mid", "end", "beg", "mid", "mid", "end"),
school=c("a", "b", "c", "d", "e", "e", "f", "g", "h", "h", "h", "u", "j", "k", "k", "l"),
start_date=c("20000105", "20000601", "20000901", "20000105", "20000601", "20000620", "20000901", "19990805", "20000105", "20000601", "20000901", "20010115", "20060105", "20060701", "20061001", "20070110"),
end_date=c("20000501", "20000801", "20001215", "20000501", "20000801", "20001210", "20001215", "19991213", "20000501", "20000801", "20001215", "20010515", "20060501", "20060915", "20061215", "20070510"))
我看到的许多问题在一个组中都有顺序:
df_edited <- df %>%
group_by(id, idx = cumsum(seq == 1L)) %>%
mutate(counter = row_number()) %>%
ungroup %>%
select(-idx)
或在条件成立后从 1 重新开始。
df_edited$num <- ave(df_edited$id, df_edited$condition, FUN = seq_along)
我已将它们标记为收藏夹,但它们不适用于我现在想做的事情。我想要的是组号相同id
并在之后更改df$condition=="end"
id condition school start_date end_date group
124 beg a 20000105 20000501 1
124 mid b 20000601 20000801 1
124 end c 20000901 20001215 1
456 beg d 20000105 20000501 2
456 mid e 20000601 20000801 2
456 mid e 20000620 20001210 2
456 end f 20000901 20001215 2
8675 beg g 19990805 19991213 3
8675 mid h 20000105 20000501 3
8675 mid h 20000601 20000801 3
8675 mid h 20000901 20001215 3
8675 end h 20010115 20010515 3
124 beg j 20060105 20060501 4
124 mid k 20060701 20060915 4
124 mid k 20061001 20061215 4
124 end l 20070110 20070510 4
有人可以帮忙吗?谢谢!
每个 ID 可以多次经历 beg,mid,end,但即使 ID 相同,我仍然希望组号不同。
解决方案
如果我们需要一个组索引,那么使用rleid
library(dplyr)
library(data.table)
df %>%
mutate(group = rleid(id))
# id condition school start_date end_date group
#1 124 beg a 20000105 20000501 1
#2 124 mid b 20000601 20000801 1
#3 124 end c 20000901 20001215 1
#4 456 beg d 20000105 20000501 2
#5 456 mid e 20000601 20000801 2
#6 456 mid e 20000620 20001210 2
#7 456 end f 20000901 20001215 2
#8 8675 beg g 19990805 19991213 3
#9 8675 mid h 20000105 20000501 3
#10 8675 mid h 20000601 20000801 3
#11 8675 mid h 20000901 20001215 3
#12 8675 end u 20010115 20010515 3
#13 124 beg j 20060105 20060501 4
#14 124 mid k 20060701 20060915 4
#15 124 mid k 20061001 20061215 4
#16 124 end l 20070110 20070510 4
或在base R
df$group <- with(rle(df$id), rep(seq_along(values), lengths))
推荐阅读
- android - 如何让父母只能点击?
- javascript - “Window.history.back()”和“history.back()”有什么区别?
- javascript - 选中复选框时的jquery然后父tr +其他rowspan tr背景更改
- reactjs - React 的“npm run build”命令在“虚拟环境”中运行时出错
- mysql - MySQL 错误 #1064 创建具有 DOUBLE 列的表
- firebase - 我无法在一台设备上安装重复的颤振应用程序
- ios - JTAppleCalendar - 仅针对可见单元格的单元格自动布局更新存在问题
- python - tensorflow 1.12 版本是否有 Keras 调谐器?如果不是,有什么替代方案?
- performance - 在 datadog 中跟踪 .Net 核心 API
- javascript - Vue3动态创建组件