r - 数据帧 R 中值组合的计数
问题描述
我有一个像这样的数据框:
df<-structure(list(id = c("A", "A", "A", "B", "B", "C", "C", "D",
"D", "E", "E"), expertise = c("r", "python", "julia", "python",
"r", "python", "julia", "python", "julia", "r", "julia")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("id",
"expertise"), spec = structure(list(cols = structure(list(id = structure(list(), class = c("collector_character",
"collector")), expertise = structure(list(), class = c("collector_character",
"collector"))), .Names = c("id", "expertise")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df
id expertise
1 A r
2 A python
3 A julia
4 B python
5 B r
6 C python
7 C julia
8 D python
9 D julia
10 E r
11 E julia
我可以通过使用以下方法获得“专业知识”的总体数量:
library(dplyr)
df %>% group_by(expertise) %>% mutate (counts_overall= n())
但是,我想要的是专业值组合的计数。换句话说,有多少“id”具有相同的两种专业知识组合,例如“r”和“julia”?这是所需的输出:
df_out<-structure(list(expertise1 = c("r", "r", "python"), expertise2 = c("python",
"julia", "julia"), count = c(2L, 2L, 3L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L), .Names = c("expertise1",
"expertise2", "count"), spec = structure(list(cols = structure(list(
expertise1 = structure(list(), class = c("collector_character",
"collector")), expertise2 = structure(list(), class = c("collector_character",
"collector")), count = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("expertise1", "expertise2", "count"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df_out
expertise1 expertise2 count
1 r python 2
2 r julia 2
3 python julia 3
解决方案
来自latemail评论的链接答案创建了一个矩阵
crossprod(table(df) > 0)
expertise expertise julia python r julia 4 3 2 python 3 4 2 r 2 2 3
而 OP 需要一个长格式的数据帧。
1) 交叉连接
以下是data.table
使用CJ()
(交叉连接)功能的解决方案:
library(data.table)
setDT(df)[, CJ(expertise, expertise)[V1 < V2], by = id][
, .N, by = .(expertise1 = V1, expertise2 = V2)]
expertise1 expertise2 N 1: julia python 3 2: julia r 2 3: python r 2
CJ(expertise, expertise)[V1 < V2]
是 or 的等价data.table
物。t(combn(df$expertise, 2))
combinat::combn2(df$expertise)
2) 自加入
这是另一个使用自连接的变体:
library(data.table)
setDT(df)[df, on = "id", allow = TRUE][
expertise < i.expertise, .N, by = .(expertise1 = expertise, expertise2 = i.expertise)]
expertise1 expertise2 N 1: python r 2 2: julia r 2 3: julia python 3
推荐阅读
- c# - Environment.TickCount 如何在 25 天(或 48-49 天)后溢出
- json - 过滤角度 json 数据
- kubernetes - Kubernetes 1.7 calico 日志消息 calico-node -felix-live -bird-live] 和超时 1 (s)
- javascript - JQuery 在移动设备上返回 NaN
- gitlab - Gitlab runner 不同步 lfs 对象
- bash - 如何从 Jenkins 的“Execute Shell”在 SVN 中运行 shell 脚本?
- sql-optimization - dolphindb中where子句的表现
- swiftui - 如何使用 LongPressGesture 并与 Timer 结合以“泵”值进行建模?
- javascript - Blazor JS 返回不适用于嵌入功能
- python - 如何创建for循环来比较线性回归模型的训练和测试分数