首页 > 解决方案 > 我如何 pivot_wider() 以便在自己的列中维护重复项?

问题描述

我有一个很长的数据框,我想扩大它。长数据框包含重复项,我想将它们保留为宽格式中的唯一列 - 但宽格式输出将重复项组合在单个列列表中。我用 pivot_wider() 尝试了一些东西,并认为 unnest() 可能是我需要的。下面是我能够找到的最接近我正在寻找的东西,但它并不完全在那里。

在下面的示例中,我希望宽数据框包含每个“盒子”的变量“惊喜”的值——包含与独特和非独特惊喜一样多的列。从概念上讲,我追求的是一个盒子里的独特和非独特的惊喜。

是否可以通过这种方式扩大长数据?

library(tidyverse)

# some data; long format
long_box <- c("A", "A", "A", "B", "B", "B", "C", "C")
surprise <- c("apple", "orange", "orange", "apple", "banana", "insects", "apple", "insects")

# the data frame I have
tibble(long_box, surprise)

#> # A tibble: 8 x 2
#>   long_box surprise
#>   <chr>    <chr>   
#> 1 A        apple   
#> 2 A        orange  
#> 3 A        orange  
#> 4 B        apple   
#> 5 B        banana  
#> 6 B        insects 
#> 7 C        apple   
#> 8 C        insects

# same data, wide format
wide_box <- c("A", "B", "C")
a <- c(rep("apple",3))
b <- c("orange", "banana", "insects")
c <- c("orange", "insects", NA)

# the data frame format I want
tibble(wide_box, a, b, c) %>% 
  rename(suprise_1 = a,
         suprise_2 = b,
         suprise_3 = c)

#> # A tibble: 3 x 4
#>   wide_box suprise_1 suprise_2 suprise_3
#>   <chr>    <chr>     <chr>     <chr>    
#> 1 A        apple     orange    orange   
#> 2 B        apple     banana    insects  
#> 3 C        apple     insects   <NA>

# this is what I've tried to get from long to wide
tibble(long_box, surprise) %>% 
  pivot_wider(id_cols = long_box,
              names_from = surprise,
              values_from = surprise)

#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 3 x 5
#>   long_box apple     orange    banana    insects  
#>   <chr>    <list>    <list>    <list>    <list>   
#> 1 A        <chr [1]> <chr [2]> <NULL>    <NULL>   
#> 2 B        <chr [1]> <NULL>    <chr [1]> <chr [1]>
#> 3 C        <chr [1]> <NULL>    <NULL>    <chr [1]>

reprex 包于 2021-07-15 创建 (v2.0.0 )

标签: rtidyr

解决方案


创建一个序列列,它应该可以工作

library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
tibble(long_box, surprise) %>%
     mutate(nm1= str_c('suprise_', rowid(long_box))) %>% 
     pivot_wider(names_from = nm1, values_from = surprise)

-输出

# A tibble: 3 x 4
  long_box suprise_1 suprise_2 suprise_3
  <chr>    <chr>     <chr>     <chr>    
1 A        apple     orange    orange   
2 B        apple     banana    insects  
3 C        apple     insects   <NA>     

推荐阅读