首页 > 解决方案 > 如何通过 R 和 dplyr 中的分组变量集进行总结?

问题描述

我想使用不同的分组变量集对数据框进行分组。对于每一组,我想计算观察的数量(或以任何其他方式总结),然后将所有结果收集在一个数据框中。

重要提示:我想以编程方式定义分组变量集,例如作为列表。

我如何在 tidyverse 中实现这一点?

这是我的尝试:

library(tidyverse)

count_by_group <- function(...) {
  mtcars %>%
    count(...) %>%
    mutate(
      grouping_variable = paste(ensyms(...), collapse = "."),
      group = paste(!!!enquos(...), sep = ".")
    ) %>%
    select(grouping_variable, group, n)
}

# I want this ...
bind_rows(
  count_by_group(cyl),
  count_by_group(gear),
  count_by_group(cyl, gear)
)
#>    grouping_variable group  n
#> 1                cyl     4 11
#> 2                cyl     6  7
#> 3                cyl     8 14
#> 4               gear     3 15
#> 5               gear     4 12
#> 6               gear     5  5
#> 7           cyl.gear   4.3  1
#> 8           cyl.gear   4.4  8
#> 9           cyl.gear   4.5  2
#> 10          cyl.gear   6.3  2
#> 11          cyl.gear   6.4  4
#> 12          cyl.gear   6.5  1
#> 13          cyl.gear   8.3 12
#> 14          cyl.gear   8.5  2

# ... but without the repetition of "count_by_group(var)".
# The following does not work:
map_dfr(
  list(
    cyl,
    gear,
    c(cyl, gear)
  ),
  count_by_group
)
#> Error in map(.x, .f, ...): object 'cyl' not found

reprex 包(v0.3.0)于 2020-09-17 创建

标签: rdplyrpurrrtidyeval

解决方案


更新(2020-10-12):更透明的解决方案(感谢@LionelHenry)

library(tidyverse)

count_by_group <- function(...) {
  dots <- enquos(..., .named = TRUE)
  names <- names(dots)

  counted <- count(mtcars, !!!dots)

  group <- counted %>%
    select(-n) %>%
    rowwise() %>%
    mutate(paste(c_across(), collapse = ".")) %>%
    pull()

  # # Equivalently:
  # group <- counted %>%
  #   select(-n) %>%
  #   pmap(counted, paste, sep = ".")

  counted %>%
    mutate(
      grouping_variable = paste(names, collapse = "."),
      group = group
    ) %>%
    select(grouping_variable, group, n)
}

grouping_variables <- list(
  vars(cyl),
  vars(gear),
  vars(cyl, gear)
)

map_dfr(grouping_variables, ~ count_by_group(!!! .x))
#>    grouping_variable group  n
#> 1                cyl     4 11
#> 2                cyl     6  7
#> 3                cyl     8 14
#> 4               gear     3 15
#> 5               gear     4 12
#> 6               gear     5  5
#> 7           cyl.gear   4.3  1
#> 8           cyl.gear   4.4  8
#> 9           cyl.gear   4.5  2
#> 10          cyl.gear   6.3  2
#> 11          cyl.gear   6.4  4
#> 12          cyl.gear   6.5  1
#> 13          cyl.gear   8.3 12
#> 14          cyl.gear   8.5  2

reprex 包(v0.3.0)于 2020 年 10 月 12 日创建


我刚刚发现这行得通!

library(tidyverse)

count_by_group <- function(...) {
  mtcars %>%
    count(...) %>%
    mutate(
      grouping_variable = paste(ensyms(...), collapse = "."),
      group = paste(!!!enquos(...), sep = ".")
    ) %>%
    select(grouping_variable, group, n)
}

grouping_variables <- list(
  vars(cyl),
  vars(gear),
  vars(cyl, gear)
)

map_dfr(grouping_variables, ~count_by_group(!!! .))
#>    grouping_variable group  n
#> 1                cyl     4 11
#> 2                cyl     6  7
#> 3                cyl     8 14
#> 4               gear     3 15
#> 5               gear     4 12
#> 6               gear     5  5
#> 7           cyl.gear   4.3  1
#> 8           cyl.gear   4.4  8
#> 9           cyl.gear   4.5  2
#> 10          cyl.gear   6.3  2
#> 11          cyl.gear   6.4  4
#> 12          cyl.gear   6.5  1
#> 13          cyl.gear   8.3 12
#> 14          cyl.gear   8.5  2

reprex 包(v0.3.0)于 2020 年 10 月 12 日创建


推荐阅读