r - Bootstrapping by multiple groups in the tidyverse: rsample vs. broom
问题描述
In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap
function specifying the by_group
argument with TRUE
.
My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).
In broom
I did the following:
# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)
# some data to bootstrap
set.seed(123)
data <- tibble(
group=rep(c('group1','group2','group3','group4'), 25),
subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
v1=rnorm(100),
v2=rnorm(100)
)
# the actual approach using broom::bootstrap
tibble(id = 1:100) %>%
mutate(data = map(id, ~ {data %>%
group_by(group,subgroup) %>%
broom::bootstrap(100, by_group=TRUE)}))
Since the broom::bootstrap
function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps
. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?
data %>%
dplyr::mutate(group2 = group,
subgroup2 = subgroup) %>%
tidyr::nest(-group2, -subgroup2) %>%
dplyr::mutate(boot = map(data, ~ rsample::bootstraps(., 100))) %>%
pull(boot) %>%
purrr::map(., "splits") %>%
transpose %>%
purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>%
tibble(id = 1:length(.), data = .)
解决方案
推荐阅读
- typescript - 如何使用 npm 包中的打字稿
- python - 来自用户输入的决策 + 存储结果在列表中 [未决问题]
- nginx - aws Nginx + .net core web-api 上的 elb + 自动缩放
- javascript - 为什么 textarea 中的选择范围重置为 0?
- node.js - TypeError:无法读取 null 的属性“会话”
- python - 如何从 Django 服务器与外部 python 脚本通信
- amazon-eks - EKS 1.11 和 HPA
- r - 在 R 中更改晶格密度图 Y 轴上的科学记数法
- keras - 使用 keras 的多层自动编码器,指定不同的优化器
- php - 从 WordPress MySQL user_meta 表和我的自定义表中获取数据