首页 > 解决方案 > 根据子组为组分配值

问题描述

在 R 中,我有一个看起来像这样的 df:

structure(
list(
`Family ID` = c("1", "1", "1", "2", "2", "2","3", "3", "3", "3", "4", "4", "4", "4"),
`Subject ID` = c("1","2", "4", "1", "2", "4", "1", "2", "4", "5", "1", "2", "4", "5"),
X = c("1", "2", "1", "1", "2", "2", "2", "1", "2", "1", "1","2", "2", "2"), 
Y = c("1", "2", "2", "1", "2", "2", "1", "1","2", "2", "2", "1", "2", "2")
), row.names = 2:15, class = "data.frame"
)

#>    Family ID Subject ID X Y
#> 2          1          1 1 1
#> 3          1          2 2 2
#> 4          1          4 1 2
#> 5          2          1 1 1
#> 6          2          2 2 2
#> 7          2          4 2 2
#> 8          3          1 2 1
#> 9          3          2 1 1
#> 10         3          4 2 2
#> 11         3          5 1 2
#> 12         4          1 1 2
#> 13         4          2 2 1
#> 14         4          4 2 2
#> 15         4          5 2 2

reprex 包(v0.3.0)于 2021-04-15 创建

我的目标是为所有具有相同家庭 ID 的人创建一个包含值 1 的新列,当且仅当主题 ID 为 4 或 5 在 x 列或 y 列中包含值 1 时。因此,此示例中的结果将如下所示:

#>    Family ID Subject ID X Y Z
#> 2          1          1 1 1 1
#> 3          1          2 2 2 1
#> 4          1          4 1 2 1
#> 5          2          1 1 1 0
#> 6          2          2 2 2 0
#> 7          2          4 2 2 0
#> 8          3          1 2 1 1
#> 9          3          2 1 1 1
#> 10         3          4 2 2 1
#> 11         3          5 1 2 1
#> 12         4          1 1 2 0
#> 13         4          2 2 1 0
#> 14         4          4 2 2 0
#> 15         4          5 2 2 0

reprex 包(v0.3.0)于 2021-04-15 创建

任何帮助在这里表示赞赏。提前道歉,因为我是新手。

标签: r

解决方案


按“FamilyID”分组后,将 SubjectID 为 4 或 5 的“X”、“Y”列子集,检查any值是否等于 1,并用 OR ( |) 运算符连接复合逻辑表达式

library(dplyr)
df1 %>% 
   group_by(FamilyID) %>% 
   mutate(Z = +(any(X[SubjectID %in% 4:5] == 1)|
              any(Y[SubjectID %in% 4:5] == 1))) %>%
   ungroup

-输出

# A tibble: 13 x 5
#   FamilyID SubjectID     X     Y     Z
#      <int>     <int> <int> <int> <int>
# 1        1         1     1     1     1
# 2        1         2     2     2     1
# 3        1         4     1     2     1
# 4        2         1     1     1     0
# 5        2         2     2     2     0
# 6        3         1     2     1     1
# 7        3         2     1     1     1
# 8        3         4     2     2     1
# 9        3         5     1     2     1
#10        4         1     2     2     0
#11        4         2     2     2     0
#12        4         4     2     2     0
#13        4         5     2     2     0

或使用base R

df1$Z <- with(df1, +(FamilyID %in% FamilyID[SubjectID %in% 
       4:5][rowSums(cbind(X, Y)[SubjectID %in% 4:5,] == 1) > 0]))
df1$Z
#[1] 1 1 1 0 0 1 1 1 1 0 0 0 0

数据

df1 <- structure(list(FamilyID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L), SubjectID = c(1L, 2L, 4L, 1L, 2L, 1L, 2L, 4L, 
5L, 1L, 2L, 4L, 5L), X = c(1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
2L, 2L, 2L, 2L), Y = c(1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -13L))

推荐阅读