r - Iterate over ID column, creating a new graph for each unique ID
问题描述
Imagine I have the following data:
dat <- read.table(text="TrxID Items Quant Team_Id
Trx1 A 3 11
Trx1 B 1 11
Trx1 C 1 12
Trx2 E 3 13
Trx2 B 1 13
Trx3 B 1 14
Trx3 C 4 14
Trx4 D 1 15
Trx4 E 1 15
Trx4 A 1 15
Trx5 F 5 18
Trx5 B 3 13
Trx5 C 2 19
Trx5 D 1 20", header=T)
dat[1, ]$Team_Id <- paste0(c('11','19'), collapse = ',')
dat[6, ]$Team_Id <- paste0(c('14','13'), collapse = ',')
Some people are on more than one team, so they have multiple team_ids stored in a list. I can generate an adjacency matrix of all the occurrences, and turn it into a graph to perform network analysis like so:
tabbed <- xtabs(~ TrxID + Items, data=dat, sparse = TRUE)
co_occur <- crossprod(tabbed, tabbed)
diag(co_occur) <- 0
co_occur
g <- graph.adjacency(co_occur, weighted=TRUE, mode ='undirected')
g <- simplify(g)
However, what I want to do is to group by the team_id
column, and to generate the above adjacency matrix and graph objects for every unique team_id. I tried using a for loop to achieve this, but I don't believe it is feasible given the size of my dataset. Moreover, it cannot handle the cases when people are on more than one team (as it would require another for loop to iterate over each element in a list).
For example,
complete_teams <- data.frame(team_id = c(11, 12, 13, 14, 15, 18, 19, 20))
for(i in complete_teams$team_id){
if(i %in% dat$Team_Id) {
newdata = subset(dat, Team_Id == i)
tabbed <- xtabs(~ TrxID + Items, data=newdata, sparse = TRUE)
co_occur <- crossprod(tabbed, tabbed)
diag(co_occur) <- 0
print(co_occur)
g <- graph.adjacency(co_occur, weighted=TRUE, mode ='undirected')
g <- simplify(g)
}
}
So, what I'm wondering is
- what is the best way to generate separate networks for each
team_id
? - how should the resultant graph objects for each
team_id
be stored in order to do analysis on them later?
If there is a more obvious way of doing this within the network analysis paradigm, please let me know.
解决方案
这是一种使用by
. 但是我在拆分逗号分隔列之前对数据进行了预处理。
create_g <- function(dx){
tabbed <- xtabs(~ TrxID + Items, data=dx, sparse = TRUE)
co_occur <- crossprod(tabbed, tabbed)
diag(co_occur) <- 0
g <- graph.adjacency(co_occur, weighted=TRUE, mode ='undirected')
g <- simplify(g)
g
}
我data.table
用来拆分列,因为它是按 ID 组:
library(data.table)
out <- setDT(dat)[, {
data.table(new_id = unlist(strsplit(Team_Id,",")),
.SD)
},Team_Id]
我们不能再使用 data.table 框架来应用created_g
,因为结果不是嵌套列表:
by(out,out$new_id,FUN=create_g)
推荐阅读
- flutter - Flutter - Json 序列化不起作用
- c# - 无法调用带参数的 webmethod
- snowflake-cloud-data-platform - 注册性能:使用存储过程的数据转换
- javascript - 链表。对代码中的平等问题感到困惑
- jquery - 通过AJAX将序列化的数据传递给MVC中的控制器
- python - 如何修复混淆矩阵中的分类器?
- javascript - Html 选项卡处于活动状态
- google-cloud-platform - 文件上传,使用 Python(本地系统)到 Google Cloud Storage
- android - 如何将用户调试版本闪存到 Pixel 设备?
- angular - matDatepicker中的日期值与使用Angular 8中的Reactive Form的formControl不相等