首页 > 解决方案 > 如何为在另一列中共享相同 ID 的列中的所有唯一元素创建列表?

问题描述

我有一个数据集all_transcripts,其中有一列ConvID和一列name

>all_transcripts


ConvID  Name
    5   Guest
    5   Guest      
    5   Agent      
    5   Guest     
    5   Agent      
    6   Reception      
    6   Guest  
    6   Agent 
    6   Guest      
    6   Guest      
    7   Reception     
    7   Reception     
    7   Guest 
    7   Guest      
    7   Reception        
    8   Reception      
    8   Guest      
    8   Agent      

我想获得每个 ConvID 的唯一名称

我想要的输出如下所示:

5 ['Guest','Agent']
6 ['Reception','Guest','Agent']
7 ['Reception','Guest']
8 ['Reception','Guest','Agent']

为此,我尝试了如下聚合函数:

aggregate(interactionId~name, all_transcripts, FUN= 'unique')

但这不起作用。如何更改我的代码以获得所需的输出?

标签: rlistaggregateunique

解决方案


tidyverse解决方案。

此处的区别在于嵌套返回的是列表列,而不是字符向量列。根据您的需要,这可能会或可能不会更好。

library(tidyverse, warn.conflicts = FALSE)

all_transcripts %>%
  nest(-ConvID) %>%  
  mutate(unique_names = map(data, ~ unique(.[, "Name", drop = TRUE]))) %>%
  select(-data)
#>   ConvID            unique_names
#> 1      5            Guest, Agent
#> 2      6 Reception, Guest, Agent
#> 3      7        Reception, Guest
#> 4      8 Reception, Guest, Agent

data.table解决方案

library(data.table)

setDT(all_transcripts)

all_transcripts[, .(unique_names = list(unique(Name))) , by = ConvID]
#>    ConvID          unique_names
#> 1:      5           Guest,Agent
#> 2:      6 Reception,Guest,Agent
#> 3:      7       Reception,Guest
#> 4:      8 Reception,Guest,Agent

数据

all_transcripts <- structure(list(ConvID = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
                                             6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L), Name = c("Guest", "Guest", 
                                                                                           "Agent", "Guest", "Agent", "Reception", "Guest", "Agent", "Guest", 
                                                                                           "Guest", "Reception", "Reception", "Guest", "Guest", "Reception", 
                                                                                           "Reception", "Guest", "Agent")), .Names = c("ConvID", "Name"), row.names = c(NA, 
                                                                                                                                                                        -18L), class = c("data.table", "data.frame"))

推荐阅读