首页 > 解决方案 > 使用 purrr pmap 添加到列表列标题的列的“找不到对象错误”

问题描述

我有一个数据集,我想对其应用以下操作:

  1. 按一组列嵌套数据
  2. 将“数据”列表列修改为宽格式
  3. 使用“数据”列表列和分组列中的值在“数据”列表列中创建一个新列,同时使用在 2. 下创建的列作为输入。

1 和 2 使用 purrr 和 dplyr 工作正常,但在步骤 3 中,当我尝试引用在步骤 2 中创建的列之一时,它给了我一个“找不到对象”错误。如果我只引用在步骤 2 之前已经存在的“数据”列表列中的列,它工作正常。如果我在步骤 2 之后检查列表列的内容,那么步骤 3 中引用的所有内容都在那里,为什么新创建的列无法识别?

代表

library(tidyverse)

mtcarsT <- mtcars %>% as_tibble() %>%
  group_by(cyl, gear, vs) %>%
  mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>%
# step 1, works fine.
  nest() %>%
  mutate(data = map(data, ~ .x %>%
                      group_by(mpg, disp, hp, drat, am, carb) %>%
# step 2, also works fine, generates new columns 'wt_C', 'wt_P', 'qsec_C', 'qsec_P' in 'data'
                      pivot_wider(names_from = cp_flag, values_from = wt:qsec) %>%
                      ungroup()))

# > mtcarsT[1,]
# A tibble: 1 x 4
# Groups:   cyl, vs, gear [1]
#    cyl    vs  gear data             
#  <dbl> <dbl> <dbl> <list>           
# 1     6     0     4 <tibble [1 x 10]>
#
# > mtcarsT$data[[1]]
# A tibble: 1 x 10
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#  1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0


# step 3A: this one works fine when only referencing columns in 'data' that already existed before step 2.
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = carb - b + c)))  

# > .Last.value$data[[1]]
# A tibble: 1 x 11
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P vs_gear
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
# 1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0       8


# Step 3B: this is what I want to do, use the column 'wt_P' in the 'data' list column 
# that was created in step 2 along with other columns 'vs' and 'gear' in the nested tibble, 
# but it throws the error 'object wt_P not found'
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = wt_P - b + c)))

# Error: object 'wt_P' not found
# Called from: mutate_impl(.data, dots, caller_env())


'''

I'm using R 3.6.2 x64 with tidyverse 1.3.0 inside RStudio 1.2.5033 on Windows 10.




标签: rdplyrpurrr

解决方案


该示例按预期工作,但问题存在于步骤 1 本身。

为了表明该示例有效,您可以更改wt_Pwt_C并且它有效。

library(tidyverse)

mtcarsT %>%
   mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                      .f = function(a, b, c)
                             a %>% dplyr::mutate(vs_gear = wt_C - b + c)))

#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 11]> 
# 2     4     1     4 <tibble [8 × 11]> 
# 3     6     1     3 <tibble [2 × 11]> 
#....

第 1 步中的问题是当你在做

mtcars %>% as_tibble() %>%
    group_by(cyl, gear, vs) %>%
    mutate(cp_flag = rep(c('C', 'P'), length.out = n()))

某些只有 1 个观察值的组根本没有获得P价值。

mtcars %>%  count(cyl, gear, vs)

# A tibble: 10 x 4
#     cyl  gear    vs     n
#   <dbl> <dbl> <dbl> <int>
# 1     4     3     1     1
# 2     4     4     1     8
# 3     4     5     0     1
# 4     4     5     1     1
# 5     6     3     1     2
# 6     6     4     0     2
# 7     6     4     1     2
# 8     6     5     0     1
# 9     8     3     0    12
#10     8     5     0     2 

因此,wt_P没有为它们计算,它返回一个错误,而wt_C没有。如果您将 order in repfrom更改为c('C', 'P')to c('P', 'C')then,您将收到错误,wt_C并且wt_P会按预期工作。


要添加缺少的列,我们可以这样做:

mtcars %>%
   group_by(cyl, gear, vs) %>% 
   mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>% 
   nest() %>% 
   mutate(data = map(data, ~{ 
                       temp <- .x %>% 
                       group_by(mpg, disp, hp, drat, am, carb) %>% 
                       pivot_wider(names_from = cp_flag, values_from = wt:qsec, 
                       values_fill = list(wt = NA, qsec = NA)) %>% 
                       ungroup()
                       temp[setdiff(cols, names(temp))] <- NA;temp
         })) 


#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 10]> 
# 2     4     1     4 <tibble [8 × 10]> 
# 3     6     1     3 <tibble [2 × 10]> 
# 4     8     0     3 <tibble [12 × 10]>
# 5     6     1     4 <tibble [2 × 10]> 
# 6     4     1     3 <tibble [1 × 10]> 
# 7     4     0     5 <tibble [1 × 10]> 
# 8     4     1     5 <tibble [1 × 10]> 
# 9     8     0     5 <tibble [2 × 10]> 
#10     6     0     5 <tibble [1 × 10]> 

所以它们都有相同的列数。


推荐阅读