r - 创建显示共现的整洁数据框:共现网络的三列使用来自不均匀字符向量列表的数据
问题描述
我需要帮助寻找与 r 网络包一起使用的结构化数据的解决方案吗?
我有一个列表,author_list,每个字符向量包含几个作者,例如:
document_authors1 = c("King, Stephen", "Martin, George", "Clancy, Tom")
document_authors2 = c("Clancy, Tom", "Patterson, James", "Stine, RL", "King, Stephen")
document_authors3 = c("Clancy, Tom", "Patterson, James", "Stine, RL", "King, Stephen")
author_list = list(document_authors1, document_authors2, document_authors3)
作者列表
[[1]] [1] “金,斯蒂芬” “马丁,乔治” “克兰西,汤姆”
[[2]] [1] “Clancy, Tom” “Patterson, James” “Stine, RL” “King, Stephen”
[[3]] [1] “Clancy, Tom” “Patterson, James” “Stine, RL” “King, Stephen”
我需要基于 author_list 创建一个数据框,其中包含三列。前两列有作者姓名,其中 col1 有一个作者的行值,col2 有另一个作者的行值,第三列称为 co-occurrence,提供作者对(col1 和 col2 ,第 1 行)发生。例如,
col1 col2 co-occurrence
1 King, Stephen Patterson, James 2
2 Martin, George Clancy, Tom 1
等等……</p>
我一直试图从一个包中找到一个函数来做到这一点,但没有运气。我也一直在尝试逐步拼凑一个解决方案,但这似乎是在暗示我。希望它比我想象的要容易。任何意见或建议将不胜感激。
解决方案
我不完全确定这是您感兴趣的内容,但希望这会有所帮助。
library(dplyr)
# Only include elements in list with more than one author
author_list <- author_list[lengths(author_list)>1]
# Identify every combination of pairs of authors for each element in list
mat <- do.call(rbind, lapply(1:length(author_list), function(x) t(combn(author_list[[x]],2))))
# Within each row sort alphabetically
mat <- t(apply(mat, 1, sort))
# Count up pairs of authors
as.data.frame(mat) %>%
group_by_all() %>%
summarise(count = n())
# A tibble: 8 x 3
# Groups: V1 [3]
V1 V2 count
<fct> <fct> <int>
1 Clancy, Tom King, Stephen 3
2 Clancy, Tom Martin, George 1
3 Clancy, Tom Patterson, James 2
4 Clancy, Tom Stine, R.L. 2
5 King, Stephen Martin, George 1
6 King, Stephen Patterson, James 2
7 King, Stephen Stine, R.L. 2
8 Patterson, James Stine, R.L. 2