首页 > 解决方案 > 在嵌套的小标题中分组汇总(带有排列)

问题描述

我有一个相当简单的问题,答案已经很复杂(通过循环),但我希望有人能指出我在purrr.

基本上,我正在考虑为我的学生介绍排列,作为统计推断(即tz值)的样板方法的计算替代方法。在我设置的玩具示例中,我正在做一些分组方法(通过dplyr''sgroup_by()summarize())以及排列 via modelr。我想知道如何将分组均值存储在包含排列的嵌套小标题中。

我已经有一个通过循环的解决方案(绕过将它们存储在带有排列的小标题中),但我想看看解决方案是什么purrr

这是我正在做的一个基本示例。

library(tidyverse)
library(modelr)

mtcars %>%
  permute(1000, mpg) -> perm_mtcars

perm_sums <- tibble()

# convoluted loop answer, does what I want,
# but is convoluted loop and spams the R console with messages
# about "ungrouping output" because of group_by()
for (i in 1:1000) {
  perm_mtcars %>%
    slice(i) %>%
    pull(perm) %>% as.data.frame %>%
    group_by(cyl) %>%
    summarize(mean = mean(mpg)) %>%
    mutate(perm = i) -> hold_this
  perm_sums <- bind_rows(perm_sums, hold_this)
}

# what I'd like to do, based off how easy this is to pull off with running regressions,
# tidying the output, and extracting that.
perm_mtcars %>%
  mutate(groupsums = map(perm, ~summarize(???)) %>%
  # and where I might be getting ahead of myself
  pull(groupsums) %>% 
  map2_df(., seq(1, 1000), ~mutate(.x, perm = .y))

借用这种表达方式,这可能很容易,purrrpurrr现在对我来说主要是希腊语。

标签: rpermutationpurrr

解决方案


在我看来,您可能会受益于对“列表列”进行操作然后使用该tidyr::unnest功能。

在这个例子中,我使用lapply了对列表列进行操作,但是purrr::map如果你真的想要的话,你可以很容易地使用。

library(tidyverse)
library(modelr)

groupmean <- function(x) {
  x %>% 
    as.data.frame %>%
    group_by(cyl) %>%
    summarize(mpg_mean = mean(mpg), .groups = 'drop')
}

perm_means <- mtcars %>%
  permute(1000, mpg) %>%
  mutate(perm = lapply(perm, groupmean)) %>%
  unnest(perm)

perm_means %>% head
#> # A tibble: 6 x 3
#>     cyl mpg_mean .id  
#>   <dbl>    <dbl> <chr>
#> 1     4     17.5 0001 
#> 2     6     23.6 0001 
#> 3     8     20.3 0001 
#> 4     4     20.1 0002 
#> 5     6     19.6 0002 
#> 6     8     20.3 0002

对于后代,这是使用的等价物data.table

library(data.table)
library(modelr)

f = function(x) as.data.table(x)[, .(mpg_mean = mean(mpg)), by=.(cyl)]
perm_mtcars = permute(mtcars, 1000, mpg)
perm_mtcars = data.table(perm_mtcars)
perm_mtcars[, perm := lapply(perm, f)][
            , perm[[1]], by=.(.id)]
#>        .id cyl mpg_mean
#>    1: 0001   6 17.21429
#>    2: 0001   4 22.52727
#>    3: 0001   8 19.61429
#>    4: 0002   6 19.92857
#>    5: 0002   4 22.40909
#>   ---                  
#> 2996: 0999   4 20.85455
#> 2997: 0999   8 19.22143
#> 2998: 1000   6 18.41429
#> 2999: 1000   4 18.20000
#> 3000: 1000   8 22.41429

推荐阅读