首页 > 解决方案 > Unique element pairing between two vectors maximizing the overall sum

问题描述

I have a data frame that contains all the possible combinations between the elements of two vectors and for each combination I have a corresponding score. I was trying to find an efficient way to find the subset of unique pairs with unique elements (i.e., an element from one vector can be found only once across all pairs) that maximizes the sum of scores corresponding to each combination. As example data, consider this df:

df = data.frame(Var1 = c("A", "B", "C"), Var2 = c("A", "C", "D"))
df = expand.grid(df$Var1, df$Var2)
df$score = c(1, 0.5, 2, 1, 0.5, 0.5, 1, 2, 1)
> df
  Var1 Var2 score
1    A    A   1.0
2    B    A   0.5
3    C    A   2.0
4    A    C   1.0
5    B    C   0.5
6    C    C   0.5
7    A    D   1.0
8    B    D   2.0
9    C    D   1.0
> 

The expected result would be:

A  C  1
B  D  2
C  A  2

Note that you can have overlap between the elements of the two vectors, but again each element from each vector should appear only once. Also, the pair A A 1 is allowed and would've been possible, but that would make it impossible then to generate the pair C A 2 which would increase the overall sum of the score. As an attempt I have used this one liner with the functionality from dplyr

df <- df %>% group_by(Var1) %>% slice(which.max(score)) %>% as.data.frame()

which produces:

> df
  Var1 Var2 score
1    A    A     1
2    B    D     2
3    C    A     2

which is close enough.. but the A from the second vector is repeated. Do you have any suggestions? Thank you in advance!

标签: runique-constraintpairingmaximization

解决方案


好吧,我最终找到了基于R包的函数中实现的匈牙利算法的解决方案。为了让它工作,像这样转换你的矩阵:solve_LSAPcluedf

df = matrix(sapply(df$score, function(x) x), nrow=length(unique(df$Var1)), ncol=length(unique(df$Var2)), dimnames = list(unique(df$Var1), unique(df$Var2)))

并应用该功能

df.res = solve_LSAP(df, maximum = T)
> df.res
Optimal assignment:
1 => 2, 2 => 3, 3 => 1

然后取回实际的节点或名称

df.res = cbind(rownames(df), colnames(df)[df.res])
> df.res
     [,1] [,2]
[1,] "A"  "C" 
[2,] "B"  "D" 
[3,] "C"  "A" 
> 

太棒了!


推荐阅读