首页 > 解决方案 > Bootstrapping by multiple groups in the tidyverse: rsample vs. broom

问题描述

In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap function specifying the by_group argument with TRUE.

My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).

In broom I did the following:

# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)

# some data to bootstrap
set.seed(123)
data <- tibble(
  group=rep(c('group1','group2','group3','group4'), 25),
  subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
  v1=rnorm(100),
  v2=rnorm(100)
)

# the actual approach using broom::bootstrap
tibble(id = 1:100) %>% 
  mutate(data = map(id, ~ {data %>%
      group_by(group,subgroup) %>% 
      broom::bootstrap(100, by_group=TRUE)}))

Since the broom::bootstrap function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?

data %>%
  dplyr::mutate(group2 = group,
                subgroup2 = subgroup) %>% 
  tidyr::nest(-group2, -subgroup2) %>% 
  dplyr::mutate(boot  = map(data, ~ rsample::bootstraps(., 100))) %>% 
  pull(boot) %>% 
  purrr::map(., "splits") %>% 
  transpose %>% 
  purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>% 
  tibble(id = 1:length(.), data = .)

标签: rtidyverseresampling

解决方案


推荐阅读