首页 > 解决方案 > 使用两个列表中的公共信息创建矩阵

问题描述

我有两个与这个问题中显示的玩具示例结构相同的大型列表。

输入(头(列表1)):

list(FEB_GAMES = c(GAME1 = c("Stan", "Kenny", "Cartman", "Kyle", 
"Butters"), GAME2 = c("Kenny", "Cartman", "Kyle", "Butters")), 
MAR_GAMES = c(GAME3 = c("Stan", "Kenny", "Cartman", "Butters"
), GAME4 = c("Kenny", "Cartman", "Kyle", "Butters")))

输入(头(列表2)):

list(first = c("Stan", "Kenny", "Cartman", "Kyle", "Butters", 
"Kenny", "Cartman", "Kyle", "Butters"), second = c("Stan", "Kenny", 
"Cartman", "Wendy", "Ike"), third = c("Randy", "Randy", "Randy", 
"Randy"))

我想将这两个列表变成一个大的 data.frame/ 矩阵。行名将来自 list1(GAME1、GAME2、GAME3、GAME4)。colnames 将是列表 2 的列表名称(第一、第二、第三)。矩阵中的信息将是一个整数,它指的是在两个列表中找到共同字符的次数。例如 GAME1xfirst 包含 9 个常用字符,而 GAME1xthird 包含 0 个。


输出将如下所示:

        first  second  third
GAME1   9      3       0
GAME2   8      2       0
GAME3   8      3       0
GAME4   8      2       0

因此,[1,1] 中的值将是在来自列表 1 的 GAME1 列表和在列表 2 中找到的第一个列表中找到公共字符的时间的总和。

笔记。列表 1 和列表 2 中的列表具有不同数量的值。

标签: rlistdataframesum

解决方案


一个选项是首先展平'list1',merge在转换为之后执行a data.frame,然后执行table

list1a <- do.call(c, list1)
names(list1a) <- sub(".*\\.", "", names(list1a))
out <- table(merge(stack(list1a), stack(list2), by = 'values')[-1])
names(dimnames(out)) <- NULL
out
#      first second third
#GAME1     9      3     0
#GAME2     8      2     0
#GAME3     7      3     0
#GAME4     8      2     0

我们也可以tidyverse使用相同的逻辑来做到这一点

library(tidyverse)
list1 %>% 
    flatten %>% 
    enframe %>% 
    unnest %>% 
    full_join(list2 %>% 
                enframe %>%
                unnest, by = 'value') %>% 
    select(-value) %>% 
    count(name.x, name.y) %>% 
    spread(name.y, n, fill = 0) %>%
    filter(!is.na(name.x))
# A tibble: 4 x 4   
#  name.x first second third
#  <chr>  <dbl>  <dbl> <dbl>
#1 GAME1      9      3     0
#2 GAME2      8      2     0
#3 GAME3      7      3     0
#4 GAME4      8      2     0

数据

list1 <- list(FEB_games = list(GAME1 = c("Stan", "Kenny", "Cartman", "Kyle", 
"Butters"), GAME2 = c("Kenny", "Cartman", "Kyle", "Butters")), 
MAR_games = list(GAME3 = c("Stan", "Kenny", "Cartman", "Butters"
), GAME4 = c("Kenny", "Cartman", "Kyle", "Butters")))

list2 <- list(first = c("Stan", "Kenny", "Cartman", "Kyle", "Butters", 
 "Kenny", "Cartman", "Kyle", "Butters"), second = c("Stan", "Kenny", 
 "Cartman", "Wendy", "Ike"), third = c("Randy", "Randy", "Randy", 
"Randy"))

推荐阅读