首页 > 解决方案 > 具有重复项的 pivot_wider

问题描述

我有以下df

df <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2), value = c("p", 
"p", "p1", "p2", "p3", "a", "b", "c", "d"), i1 = c(1, 1, 1, 1, 
1, 1, 1, 1, 1)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))
     ID value    i1
  <dbl> <chr> <dbl>
1     1 p         1
2     1 p         1
3     1 p1        1
4     1 p2        1
5     1 p3        1
6     2 a         1
7     2 b         1
8     2 c         1
9     2 d         1

当我尝试旋转时,我收到一条错误消息,提示存在重复项。

df %>% pivot_wider(names_from = value, values_from = i1, values_fill = list(i1 = 0))
Warning message:
Values in `i1` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(i1 = list)` to suppress this warning.
* Use `values_fn = list(i1 = length)` to identify where the duplicates arise
* Use `values_fn = list(i1 = summary_fun)` to summarise duplicates

我想确定每个唯一 ID 重复哪些值,以便进行过滤。或者,也许我可以在 pivot_wider() 步骤中删除重复项。源代码具有我设置为“唯一”的 name_repair。不工作!

理想的输出是:

    p  p1  p2  p3  a  b  c  d
1   1   1   1   1  0  0  0  0
2   0   0   0   0  1  1  1  1

标签: rdplyrpivot

解决方案


我认为在 OP 的尝试中,他们试图做的是删除重复项,然后旋转可以使用distinctand 完成的数据pivot_wider

library(dplyr)
library(tidyr)

df %>%
 distinct() %>%
 pivot_wider(names_from = value, values_from = i1, values_fill = list(i1 = 0))

# A tibble: 2 x 9
#     ID     p    p1    p2    p3     a     b     c     d
#  <dbl> <int> <int> <int> <int> <int> <int> <int> <int>
#1     1     1     1     1     1     0     0     0     0
#2     2     0     0     0     0     1     1     1     1


我们也可以使用countpivot_wider

df %>% 
  count(ID, value) %>%
  mutate(n = +(n > 0)) %>%
  pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))

推荐阅读