r - Unique element pairing between two vectors maximizing the overall sum
问题描述
I have a data frame that contains all the possible combinations between the elements of two vectors and for each combination I have a corresponding score. I was trying to find an efficient way to find the subset of unique pairs with unique elements (i.e., an element from one vector can be found only once across all pairs) that maximizes the sum of scores corresponding to each combination.
As example data, consider this df
:
df = data.frame(Var1 = c("A", "B", "C"), Var2 = c("A", "C", "D"))
df = expand.grid(df$Var1, df$Var2)
df$score = c(1, 0.5, 2, 1, 0.5, 0.5, 1, 2, 1)
> df
Var1 Var2 score
1 A A 1.0
2 B A 0.5
3 C A 2.0
4 A C 1.0
5 B C 0.5
6 C C 0.5
7 A D 1.0
8 B D 2.0
9 C D 1.0
>
The expected result would be:
A C 1
B D 2
C A 2
Note that you can have overlap between the elements of the two vectors, but again each element from each vector should appear only once. Also, the pair A A 1
is allowed and would've been possible, but that would make it impossible then to generate the pair C A 2
which would increase the overall sum of the score
.
As an attempt I have used this one liner with the functionality from dplyr
df <- df %>% group_by(Var1) %>% slice(which.max(score)) %>% as.data.frame()
which produces:
> df
Var1 Var2 score
1 A A 1
2 B D 2
3 C A 2
which is close enough.. but the A
from the second vector is repeated. Do you have any suggestions? Thank you in advance!
解决方案
好吧,我最终找到了基于R包的函数中实现的匈牙利算法的解决方案。为了让它工作,像这样转换你的矩阵:solve_LSAP
clue
df
df = matrix(sapply(df$score, function(x) x), nrow=length(unique(df$Var1)), ncol=length(unique(df$Var2)), dimnames = list(unique(df$Var1), unique(df$Var2)))
并应用该功能
df.res = solve_LSAP(df, maximum = T)
> df.res
Optimal assignment:
1 => 2, 2 => 3, 3 => 1
然后取回实际的节点或名称
df.res = cbind(rownames(df), colnames(df)[df.res])
> df.res
[,1] [,2]
[1,] "A" "C"
[2,] "B" "D"
[3,] "C" "A"
>
太棒了!
推荐阅读
- git - 从 Databricks Notebook 运行 git 命令
- jenkins - groovy 脚本没有获取 aws 图像列表,可能是什么问题?
- python - aws lambda api 网关中的 Web 套接字处理大量数据
- c++ - 如何修复 C++ 错误:在抛出“std::bad_alloc”实例后调用终止。什么():std::bad_alloc
- docker - 来自守护进程的错误响应:hcsshim::CreateComputeSystem: The request is not supported
- angular - Angular 11.1.2 之后的路由器延迟加载问题
- mqtt - 尝试调试从 NodeRed 项目发送到 myqtthub 的请求
- vue.js - 从 localhost 访问 Magento 2.3.6 REST API - 被 CORS 策略阻止
- dataframe - 根据条件对同一列中的值求和
- javascript - 何时在哈巴狗中使用字符串插值,未转义!{}?