r - 根据子组为组分配值
问题描述
在 R 中,我有一个看起来像这样的 df:
structure(
list(
`Family ID` = c("1", "1", "1", "2", "2", "2","3", "3", "3", "3", "4", "4", "4", "4"),
`Subject ID` = c("1","2", "4", "1", "2", "4", "1", "2", "4", "5", "1", "2", "4", "5"),
X = c("1", "2", "1", "1", "2", "2", "2", "1", "2", "1", "1","2", "2", "2"),
Y = c("1", "2", "2", "1", "2", "2", "1", "1","2", "2", "2", "1", "2", "2")
), row.names = 2:15, class = "data.frame"
)
#> Family ID Subject ID X Y
#> 2 1 1 1 1
#> 3 1 2 2 2
#> 4 1 4 1 2
#> 5 2 1 1 1
#> 6 2 2 2 2
#> 7 2 4 2 2
#> 8 3 1 2 1
#> 9 3 2 1 1
#> 10 3 4 2 2
#> 11 3 5 1 2
#> 12 4 1 1 2
#> 13 4 2 2 1
#> 14 4 4 2 2
#> 15 4 5 2 2
由reprex 包(v0.3.0)于 2021-04-15 创建
我的目标是为所有具有相同家庭 ID 的人创建一个包含值 1 的新列,当且仅当主题 ID 为 4 或 5 在 x 列或 y 列中包含值 1 时。因此,此示例中的结果将如下所示:
#> Family ID Subject ID X Y Z
#> 2 1 1 1 1 1
#> 3 1 2 2 2 1
#> 4 1 4 1 2 1
#> 5 2 1 1 1 0
#> 6 2 2 2 2 0
#> 7 2 4 2 2 0
#> 8 3 1 2 1 1
#> 9 3 2 1 1 1
#> 10 3 4 2 2 1
#> 11 3 5 1 2 1
#> 12 4 1 1 2 0
#> 13 4 2 2 1 0
#> 14 4 4 2 2 0
#> 15 4 5 2 2 0
由reprex 包(v0.3.0)于 2021-04-15 创建
任何帮助在这里表示赞赏。提前道歉,因为我是新手。
解决方案
按“FamilyID”分组后,将 SubjectID 为 4 或 5 的“X”、“Y”列子集,检查any
值是否等于 1,并用 OR ( |
) 运算符连接复合逻辑表达式
library(dplyr)
df1 %>%
group_by(FamilyID) %>%
mutate(Z = +(any(X[SubjectID %in% 4:5] == 1)|
any(Y[SubjectID %in% 4:5] == 1))) %>%
ungroup
-输出
# A tibble: 13 x 5
# FamilyID SubjectID X Y Z
# <int> <int> <int> <int> <int>
# 1 1 1 1 1 1
# 2 1 2 2 2 1
# 3 1 4 1 2 1
# 4 2 1 1 1 0
# 5 2 2 2 2 0
# 6 3 1 2 1 1
# 7 3 2 1 1 1
# 8 3 4 2 2 1
# 9 3 5 1 2 1
#10 4 1 2 2 0
#11 4 2 2 2 0
#12 4 4 2 2 0
#13 4 5 2 2 0
或使用base R
df1$Z <- with(df1, +(FamilyID %in% FamilyID[SubjectID %in%
4:5][rowSums(cbind(X, Y)[SubjectID %in% 4:5,] == 1) > 0]))
df1$Z
#[1] 1 1 1 0 0 1 1 1 1 0 0 0 0
数据
df1 <- structure(list(FamilyID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L), SubjectID = c(1L, 2L, 4L, 1L, 2L, 1L, 2L, 4L,
5L, 1L, 2L, 4L, 5L), X = c(1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 2L), Y = c(1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -13L))