首页 > 解决方案 > 将长列表转换为具有重复项的二进制数据帧

问题描述

根据这个问题和答案,可以将长列表转换为二进制数据帧。

但是,如何将它用于每个用户多次包含相同值的数据帧中?

数据框示例:

d_long <- data.frame( nameid = c("sally","sally","sally", "sally","Robert","annie","annie","annie"), value = c("product1","ra","ent","ra","ra","ra","product1","product1"))
nameid    value
1  sally product1
2  sally       ra
3  sally      ent
4  sally       ra
5 Robert       ra
6  annie       ra
7  annie product1
8  annie product1

预期的输出是这样的:

d_exist <- data.frame(nameid = c("sally","Robert","annie"), product1 = c(1,0,1), ra = c(1,1,1), ent = c(1,0,0))
nameid product1 ra ent
1  sally        1  1   1
2 Robert        0  1   0
3  annie        1  1   0

但是当我尝试这个时:

d_long %>% group_by(nameid, value) %>%
     mutate(count = n()) %>%
     ungroup() %>%
     spread(value, count, fill = 0) %>%
     as.data.frame()

我收到错误:

错误:行 (7, 8)、(2, 4) 的标识符重复

只使用是否正确

d_long[!duplicated(d_long), ]

标签: rdplyr

解决方案


我们可以distinct先做然后做spread

library(tidyverse)
d_long %>%
  distinct %>% 
  mutate(n = 1) %>% 
  spread(value, n, fill = 0)
#    nameid ent product1 ra
#1  annie   0        1  1
#2 Robert   0        0  1
#3  sally   1        1  1

推荐阅读