r - 按组计算所有成对组合的频率
问题描述
我想计算 by 的所有成对组合的item
频率group
。
have <- data.frame(group=c("a", "a", "a",
"b", "b",
"c",
"d", "d",
"e", "e",
"f", "f", "f"),
item=c("apple", "banana", "black cherry",
"apple", "black cherry",
"orange",
"banana", "black cherry",
"banana", "black cherry",
"apple", "banana", "black cherry"))
have
# group item
# 1 a apple
# 2 a banana
# 3 a black cherry
# 4 b apple
# 5 b black cherry
# 6 c orange
# 7 d banana
# 8 d black cherry
# 9 e banana
# 10 e black cherry
# 11 f apple
# 12 f banana
# 13 f black cherry
# almost what I want...
# cons: repeats pairs and does not include zeros
have %>%
# https://stackoverflow.com/a/38335011/841405
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))
# item.x item.y `length(unique(group))` item
# 1 apple banana 2 apple, banana
# 2 apple black cherry 3 apple, black cherry
# 3 banana apple 2 banana, apple
# 4 banana black cherry 4 banana, black cherry
# 5 black cherry apple 3 black cherry, apple
# 6 black cherry banana 4 black cherry, banana
# want I really want
# item.x item.y `length(unique(group))` item
# 1 apple banana 2 apple, banana
# 2 apple black cherry 3 apple, black cherry
# 3 apple orange 0 apple, orange
# 4 banana black cherry 4 banana, black cherry
# 5 banana orange 0 banana, orange
# 6 black cherry orange 0 black cherry, orange
解决方案
我通过使用expand.grid
来完成每个组合,然后加入你已经制作的内容,然后用零填充不匹配的行。我还将您的计数重命名为 n。
have2 = have %>%
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(n = length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))
combos = expand.grid(item.x = unique(have$item),
item.y = unique(have$item)) %>%
filter(as.numeric(item.x) < as.numeric(item.y)) %>%
mutate(item = paste(item.x, item.y, sep = ', ')) %>%
arrange(item.x, item.y) %>%
left_join(have2) %>%
mutate(n = replace(n, is.na(n), 0))
推荐阅读
- javascript - 将 atlaskit markdown 表示法转换为 html 元素
- mdanalysis - 如何使用 atomselect 在 MDAnalysis 的 pdb 文件中指定 atomNumbers/Index 位置?
- webpack - MSBUILD 和 YAML 的区别
- django - 如何在 Django 迁移中将 ManyToMany 字段设置为现有对象?
- python - Python / Numpy:在多维数组的分组列中按行组合布尔掩码
- java - 将 TextView 值链接到 firebase 数据库 ID
- sqlite - 检测房间迁移完成
- pandas - Pandas groupby 滚动删除索引列
- java - 将数据发送到 Firebase
- java - 在 android/java Activity 之间切换和传递数据