r - 获取多个(> 2)字符同时出现的实例数
问题描述
这是这里问题的延伸。
我有一个像这样的数据框:
df<-structure(list(person = c("p1", "p1", "p1", "p1", "p1", "p1",
"p1", "p2", "p2", "p2", "p3", "p3", "p3", "p4", "p4", "p4", "p5",
"p5", "p5", "p6", "p6", "p6", "p7", "p7", "p7"), hp_char = c("hp1",
"hp2", "hp3", "hp4", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10",
"hp1", "hp2", "hp3", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10",
"hp3", "hp4", "hp5", "hp1", "hp2", "hp3")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("person",
"hp_char"), spec = structure(list(cols = structure(list(person = structure(list(), class = c("collector_character",
"collector")), hp_char = structure(list(), class = c("collector_character",
"collector"))), .Names = c("person", "hp_char")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
根据Uwe提供的真正有效的 self-join/data.table 答案,我得到任何两个“hp_id”的共现实例数,如下所示:
df_by2<- setDT(df)[df, on = "person", allow = TRUE][
hp_char < i.hp_char, .N, by = .(HP_ID1 = hp_char, HP_ID2 = i.hp_char)]
这给了我:
HP_ID1 HP_ID2 N
1: hp1 hp2 3
2: hp1 hp3 3
3: hp2 hp3 3
4: hp1 hp4 1
5: hp2 hp4 1
6: hp3 hp4 2
7: hp1 hp5 1
8: hp2 hp5 1
9: hp3 hp5 2
10: hp4 hp5 2
11: hp1 hp6 1
12: hp2 hp6 1
13: hp3 hp6 1
14: hp4 hp6 1
15: hp5 hp6 2
16: hp1 hp7 1
17: hp2 hp7 1
18: hp3 hp7 1
19: hp4 hp7 1
20: hp5 hp7 2
21: hp6 hp7 2
22: hp10 hp8 2
23: hp8 hp9 2
24: hp10 hp9 2
但是我想知道是否可以扩展此方法,其中可以计算大于两个“hp_char”的共现实例的数量。换句话说,我正在寻找这样的输出(例如,发生 3 个事件的次数):
HP_ID1 HP_ID2 HP_ID3 N
1 hp1 hp2 hp3 3
2 hp3 hp4 hp5 2
3 hp5 hp6 hp7 2
4 hp8 hp9 hp10 2
到目前为止,我已经能够找到两个事件同时发生的多种解决方案,但它们似乎不能推广到计数 > 2 个事件的实例。谢谢你的帮助!
解决方案
如果您使用组合方法,它会更干净:
library(data.table)
setDT(df)
nset <- 3
cols <- paste0("hp_char", seq_len(nset))
#create combinations of nset number of skills
combi <- do.call(CJ, rep(df[,.(unique(hp_char))], nset))
setnames(combi, cols)
#create for each person the combinations of nset number of skills
nsetSkills <- df[, do.call(CJ, rep(.(hp_char), nset)), by=.(person)]
setnames(nsetSkills, names(nsetSkills)[-1L], cols)
#join the above 2 sets and calculate the occurrence for each row in combi
ans <- nsetSkills[combi, on=cols, .N, by=.EACHI]
ans
输出:
hp_char1 hp_char2 hp_char3 N
1: hp1 hp1 hp1 3
2: hp1 hp1 hp10 0
3: hp1 hp1 hp2 3
4: hp1 hp1 hp3 3
5: hp1 hp1 hp4 1
---
996: hp9 hp9 hp5 0
997: hp9 hp9 hp6 0
998: hp9 hp9 hp7 0
999: hp9 hp9 hp8 2
1000: hp9 hp9 hp9 2
推荐阅读
- node.js - 将代码放在 app.listen 内部和外部有什么区别?
- python - 循环遍历多个数据帧并创建日期时间索引然后加入数据帧
- javascript - 如何使用 JS 随机化我的输入位置?
- sql - 如何计算带有滚动日期窗口的 SAS PROC SQL 的不同?
- wordpress - 更改特定 Woocommerce 产品类别存档页面的默认排序
- django - 在 CircleCI 上运行 Django - psycopg2.OperationalError:无法连接到服务器:连接被拒绝
- django - django 中用于 GenericRelation 的 DRF 可写序列化程序
- websphere-liberty - 在 server.xml 中使用 maven 过滤而不破坏 mvn liberty:dev
- c++ - 如何确保 boids 100% 的时间避开墙壁
- javascript - 希望在页面导航栏中突出显示滚动(从左到右)