r - 改进 dplyr 解决方案 -- 根据其他信息通过条件排序(位置)创建变量
问题描述
我正在研究一个数据集,其中每个参与者(ID)都被评估了 1、2 或 3 次。这是一项纵向研究。不幸的是,当第一位分析师对数据集进行编码时,她/他没有分配任何相关信息。
因为所有参与者都有年龄信息(以月为单位),所以很容易识别第一次评估的时间,第二次评估的时间等等。在第一次评估中,参与者比第二次年轻,依此类推。
我使用 tidyverse 工具来处理这个问题,一切正常。但是,我真的知道(想象一下......)还有许多其他(更多)优雅的解决方案,我来到这个论坛寻求这个。有人可以给我一些关于如何使这段代码更短更清晰的想法吗?
这是重现代码的假数据:
ds <- data.frame(id = seq(1:6),
months = round(rnorm(18, mean=12, sd=2),0),
x1 = sample(0:2),
x2 = sample(0:2),
x3 = sample(0:2),
x4 = sample(0:2))
#add how many times each child was acessed
ds <- ds %>% group_by(id) %>% mutate(how_many = n())
#Add position
ds %>% group_by(id) %>%
mutate(first = min(months),
max = max(months),
med = median(months)) -> ds
#add label to the third evaluation (the second will be missing)
ds %>%
mutate(group = case_when((how_many == 3) & (months %in% first) ~ "First evaluation",
(how_many == 3) & (months %in% max) ~ "Third evaluation",
TRUE ~ group)) -> ds
#add label to the second evaluation for all children evaluated two times
ds %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> ds
这是我的原始代码:
temp <- dataset %>% select(idind, arm, infant_sex,infant_age_months)
#add how many times each child was acessed
temp <- temp %>% group_by(idind) %>% mutate(how_many = n())
#Add position
temp %>% group_by(idind) %>%
mutate(first = min(infant_age_months),
max = max(infant_age_months),
med = median(infant_age_months)) -> temp
#add label to the first evaluation
temp %>%
mutate(group = case_when(how_many == 1 ~ "First evaluation")) -> temp
#add label to the second evaluation (and keep all previous results)
temp %>%
mutate(group = case_when((how_many == 2) & (infant_age_months %in% first) ~ "First evaluation",
(how_many == 2) & (infant_age_months %in% max) ~ "Second evaluation",
TRUE ~ group)) -> temp
#add label to the third evaluation (the second will be missing)
temp %>%
mutate(group = case_when((how_many == 3) & (infant_age_months %in% first) ~ "First evaluation",
(how_many == 3) & (infant_age_months %in% max) ~ "Third evaluation",
TRUE ~ group)) -> temp
#add label to the second evaluation for all children evaluated two times
temp %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> temp
请记住,我在询问之前使用了搜索框,我真的想其他人在编程时可以解决同样的问题。非常感谢
解决方案
你去吧。我曾经rank()
给出治疗的顺序。
ds <- data.frame(id = seq(1:6),
months = round(rnorm(18, mean=12, sd=2),0),
x1 = sample(0:2),
x2 = sample(0:2),
x3 = sample(0:2),
x4 = sample(0:2))
ds2 = ds %>% group_by(id) %>% mutate(rank = rank(months,ties.method="first"))
labels = c("First", "Second","Third")
ds2$labels = labels[ds2$rank]
推荐阅读
- php - 创建 Laravel 数据库表时,我应该先运行哪个终端 - Laravel 6?
- python - “系列”对象没有属性“len”熊猫 CSV 文件
- spring - 如何根据一个端点的特定 REST URL 在 Appdynamics 中创建仪表板
- android - 对相同的应用程序使用相同的密钥库,但再次从头开始构建
- python - 循环列表的问题,列表索引超出范围
- javascript - 如果 div 包含特定的文本字符串,则更改背景颜色
- mysql - How to delete records in another mysql table only if all the matching records have been flagged
- excel - 带有嵌套 SUMIFS 的 VLookUp
- android - Cannot import gradle library in code, although it appears in my external libraries list
- javascript - Javascript object undefined issue