首页 > 解决方案 > 泛化 data.frame 子集函数

问题描述

我有一个玩具 data.frame,它有 4 列(study、、、、outcome)。比如说,用户想知道任何其他选定列值在哪些唯一值中是恒定的或变化的。grouptimestudy

例如,如果用户想知道哪些唯一study值,outcomegroup值是恒定的还是变化的,那么我们知道可能有 4 种情况:

  1. group 是恒定的但outcome变化的。
  2. outcome是恒定的但group 变化的。
  3. outcome&group两者都不同。
  4. outcome&group两者都没有变化。

下面的函数foo,正是基于上面的例子。

问题:我想知道如何概括foo,以便用户可以在函数中输入他选择的列的名称(例如,outcomegroup),并foo自动检查任何选择的列在哪些唯一study值中是恒定的或变化的?

附言。在下面的示例中,我的广义函数将产生如下所示的相同输出。

h = "
study outcome group time
a     1       1     0
a     2       1     1
b     1       1     0
b     1       2     0
c     2       1     0
c     3       2     1
d     1       1     0
d     1       1     0
e     1       1     0"
h = read.table(text=h,h=T)

foo <- function(dat, cond) {
  
  switch(cond, 
         
         `1` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) == 1, n_distinct(outcome) > 1) %>%
           ungroup,
         `2` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) > 1, n_distinct(outcome) == 1) %>%
           ungroup,
         
         `3` =  dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) > 1, n_distinct(outcome) > 1) %>%
           ungroup,
         `4` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) == 1, n_distinct(outcome) == 1) %>%
           ungroup )  } 

#------------------- EXAMPLE OF USE:
> foo(h, 1)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 a           1     1
2 a           2     1
> foo(h, 2)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 b           1     1
2 b           1     2
> foo(h, 3)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 c           2     1
2 c           3     2
> foo(h, 4)
# A tibble: 3 x 3
  study outcome group
  <chr>   <int> <int>
1 d           1     1
2 d           1     1
3 e           1     1

标签: rdataframefunctiondplyrtidyverse

解决方案


如果输入参数未加引号,请使用{{}}

foo <- function(dat, study_col, group_col, outcome_col) {
  
  fn1 <- function(cond) {
           switch(cond, 
         
         `1` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) > 1) %>%
           ungroup,
         `2` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) == 1) %>%
      ungroup,
         `3` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) > 1) %>%
      ungroup,
         
         `4` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) == 1) %>%
      ungroup
         )  }
     purrr::map(1:4, ~ fn1(.x))

}

-测试

> foo(h, study, group, outcome)
[[1]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 a           1     1     0
2 a           2     1     1

[[2]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 b           1     1     0
2 b           1     2     0

[[3]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 c           2     1     0
2 c           3     2     1

[[4]]
# A tibble: 3 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 d           1     1     0
2 d           1     1     0
3 e           1     1     0

或使用

foo2 <- function(dat, study_col, group_col, outcome_col) {

    dat %>%
           dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}) %>%
           dplyr::group_by({{study_col}}) %>%
          dplyr::mutate(grp = stringr::str_c(n_distinct({{group_col}}) == 1, 
              n_distinct({{outcome_col}}) == 1 ))   %>%
           dplyr::ungroup(.) %>%
           dplyr::group_split(grp, .keep = FALSE)  



}

-测试

> foo2(h, study, group, outcome)
<list_of<
  tbl_df<
    study  : character
    group  : integer
    outcome: integer
  >
>[4]>
[[1]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 c         1       2
2 c         2       3

[[2]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 b         1       1
2 b         2       1

[[3]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 a         1       1
2 a         1       2

[[4]]
# A tibble: 3 x 3
  study group outcome
  <chr> <int>   <int>
1 d         1       1
2 d         1       1
3 e         1       1

推荐阅读