首页 > 解决方案 > 使用 map 为每个组使用嵌套中的值进行变异

问题描述

考虑下面的实验情况,其中组是不同的处理,init 是每个样本的初始值,变化是处理后的预期变化,sd_change 是变化的标准偏差。

library(tidyverse)

set.seed(001)
data1 <- tibble(group = rep(c("a", "b"), each = 4),
       init = rpois(8, 10)) %>%
  group_by(group, init) %>%
  expand(change = seq(2, 6, 2)) %>%
  mutate(sd_change = 2)  
as_tibble(data1)

> data1
# A tibble: 24 x 4
# Groups:   group, init [8]
   group  init change sd_change
   <chr> <int>  <dbl>     <dbl>
 1 a         7      2         2
 2 a         7      4         2
 3 a         7      6         2
 4 a         8      2         2
 5 a         8      4         2
 6 a         8      6         2
 7 a        10      2         2
 8 a        10      4         2
 9 a        10      6         2
10 a        11      2         2
# ... with 14 more rows

我生成最终值并获得每组的均值和方差,并更改如下

data2a <- data1 %>%
  rowwise %>%
  mutate(final = rnorm(1, change, sd_change) + init) %>%
  ungroup

data2a %>%
  group_by(group, change) %>%
  summarise(mu_start = mean(init), mu_end = mean(final), 
            v_start = var(init), v_end = var(final)) 

# A tibble: 6 x 6
# Groups:   group [2]
  group change mu_start mu_end v_start v_end
  <chr>  <dbl>    <dbl>  <dbl>   <dbl> <dbl>
1 a          2      9     10.9    3.33 13.9 
2 a          4      9     14.7    3.33  4.90
3 a          6      9     15.5    3.33 10.2 
4 b          2     11.5   13.2    4.33  3.69
5 b          4     11.5   14.8    4.33 17.8 
6 b          6     11.5   17.7    4.33  9.77

R我想通过生成one最终随机值来重复上述过程。我可以用 for 循环来做到这一点,但我正在学习purrr并且在总结时我被卡住了。请参阅下面的一个版本:

# function to generate final values where R = 3
   f <- function(n=3, x, y, z){
  out <- rnorm(n, x, y)
  out <- out + z
}

data2b <- data1 %>%  
  mutate(final = pmap(list(z = init,
                           x = change,
                           y = sd_change),
                      f)) %>%
  ungroup

as_tibble(data2b)
# A tibble: 24 x 5
   group  init change sd_change final    
   <chr> <int>  <dbl>     <dbl> <list>   
 1 a         7      2         2 <dbl [3]>
 2 a         7      4         2 <dbl [3]>
 3 a         7      6         2 <dbl [3]>
 4 a         8      2         2 <dbl [3]>
 5 a         8      4         2 <dbl [3]>
 6 a         8      6         2 <dbl [3]>
 7 a        10      2         2 <dbl [3]>
 8 a        10      4         2 <dbl [3]>
 9 a        10      6         2 <dbl [3]>
10 a        11      2         2 <dbl [3]>
# ... with 14 more rows 

总结得到mu_end应该是R=3这个例子中的长度列表。以下给出错误

data2b %>%
  split(.$group, .$change) %>%
  mutate(mu_end = map(final, mean),
         v_end = map(final, var)

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "list"

输出应该是这样的

# A tibble: 6 x 4
# Groups:   group [2]
  group change mu_end v_end
  <chr>  <dbl>  <dbl> <dbl>
1 a          2   10.9 13.9 
2 a          4   14.7  4.90
3 a          6   15.5 10.2 
4 b          2   13.2  3.69
5 b          4   14.8 17.8 
6 b          6   17.7  9.77

但是 mu_end 和 v_end 的每一行应该是一个长度列表有R 什么帮助吗?

标签: rtidyversepurrr

解决方案


我们可以执行 agroup_split然后map通过listof tibbles,通过循环mutate来创建列“final”的meanandvarlistmap

data2b %>% 
   group_split(group, change) %>%
   map_df(~ .x %>%
               mutate(mu_end = map_dbl(final, mean),
                      v_end = map_dbl(final, var)))

或者不拆分

data2b %>%
    group_by(group, change) %>%
    mutate(mu_end = map_dbl(final, mean), v_end = map_dbl(final, var))

推荐阅读