python - 组之间的平衡洗牌

问题描述

我正在尝试在 python 中为以下问题编写算法：

给定这两个长度相等的数组，其中的对象y是唯一的

x = (1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7)
y = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M')

将每个对象随机分配到重复次数y中的位置x
24

例如

[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7]
['A', 'M', 'E', 'D', 'G', 'L', 'K', 'J', 'C', 'F', 'H', 'I', 'B']
['B', 'C', 'G', 'E', 'L', 'J', 'H', 'F', 'A', 'M', 'D', 'I', 'K']
['F', 'E', 'H', 'I', 'A', 'K', 'L', 'D', 'B', 'G', 'M', 'C', 'J']
['M', 'I', 'E', 'F', 'H', 'C', 'D', 'B', 'L', 'A', 'K', 'J', 'G']
.
.
.

但是，请执行随机分配，以便最终将每个对象以尽可能相等的数量y分配给每个唯一对象。x

例如对于13重复而不是24，分配计数将非常适合这样：

    A   B   C   D   E   F   G   H   I   J   K   L   M
1   2   2   2   2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2   2   2   2   2
3   2   2   2   2   2   2   2   2   2   2   2   2   2
4   2   2   2   2   2   2   2   2   2   2   2   2   2
5   2   2   2   2   2   2   2   2   2   2   2   2   2
6   2   2   2   2   2   2   2   2   2   2   2   2   2
7   1   1   1   1   1   1   1   1   1   1   1   1   1

请注意，列总和始终必须是重复次数。对于 24 次重复，我认为没有完美的解决方案，但是沿行的计数应该尽可能相等（只有微小的整数差异）

输出将是 24 次重复的“平衡洗牌”y

我试图编写一个蛮力解决方案，它迭代地添加一个打乱的 y 并在每次失去平衡时重新启动。它为更简单的变体找到了解决方案，但在这里失败了。也许您对这个问题有一个直接的解决方案？

更新
我写了一个蛮力算法，使用尽可能少的重复次数（len（y））找到最佳解决方案。但是它不能扩展到我需要的 y=len(13)。

def find_optimal_set(x, y):
    repeats = len(y)
    groups = set(x)
    while True:
        asig = {k:{k:0 for k in y} for k in groups}
        s = [random.sample(y, repeats) for i in range(repeats)]
        for r in s:
            for i, c in enumerate(r):
                asig[x[i]][c] +=1
        if all([len(set(v.values())) == 1 for v in asig.values()]):
            return(asig, s)

它适用于这两个示例（在几秒钟内）

x = (1, 1, 1, 2, 3, 3)
y = ('A', 'B', 'C', 'D', 'E', 'F')

x = (1, 1, 2, 2, 3)
y = ('A', 'B', 'C', 'D', 'E')

标签： pythonalgorithmshuffle

解决方案

一个简单的观察是，您可以选择的任何排列x作为初始分配，然后解决一系列分配问题，以确保每个后续分配都尽可能地保持平衡。

这是一个将其清除的python实现，

#!/usr/bin/python

"""
filename: random_assign.py
purpose:  demonstrate a straightforward solution to 
          https://stackoverflow.com/questions/63250967/balanced-shuffling-between-groups
"""


import networkx as nx
import random as rand

# Problem specification taken directly from OP in question
x = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7]
y = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']

#x = (1, 2, 3, 3)
#y = ('A', 'B', 'C', 'D')

x = map(str,x)

all_x = sorted(list(set(x)))

ny = len(y)
assert ny == len(x) #else something is terribly wrong

x_count = { v : sum( [ _x == v for _x in x ] ) for v in all_x }

iter_count = 13

x0 = [_x for _x in x]
rand.shuffle(x0)
# start with a random permutation
assignments = [x0,]

# initialize histograms
histograms = { _y : { _x : 0 for _x in all_x }  for _y in y }
# update histograms
last_assigment = assignments[-1]
for _y,_x in zip(y,last_assigment):
    histograms[_y][_x] += 1

# if true print only final solution
print_only_final_solution = True

for iter_num in range(iter_count-1):

    G = nx.DiGraph()
    G.add_node('sink',demand=ny)
    for _x in all_x:
        G.add_node(_x)
        G.add_edge(_x,'sink',capacity=x_count[_x]);

    for _y in y:
        min_count = min([ histograms[_y][_x] for _x in all_x ])
        G.add_node(_y,demand=-1)
        # rand_wgts are minor random pertubations of the weights to yeild
        # random preferences for assignments and to ensure a unique solution
        # based on randomness
        rand_wgts = [ i for i in range(len(all_x)) ]
        rand.shuffle(rand_wgts)
        for i,_x in enumerate(all_x):
            wgt = 1000*(histograms[_y][_x] - min_count) + rand_wgts[i]
            G.add_edge(_y,_x,capacity=1,weight=wgt)

    flow_dict = nx.min_cost_flow(G)

    assignment = [ _x for _y in y for _x in all_x if flow_dict[_y][_x] == 1]
    assignments.append(assignment)

    # update histograms
    for _y,_x in zip(y,assignment):
        histograms[_y][_x] += 1

    if not print_only_final_solution or iter_num == iter_count-2:
        print 'assignments:'
        for a in assignments:
            print a
        print ''
        print 'histogram:'
        print '  |',
        for _y in y:
            print _y,' ',
        print ''
        print '--|',
        for _y in y:
            print '-','-',
        print ''
        for _x in all_x:
            print _x, '|',
            for _y in y:
                print histograms[_y][_x], ' ',
            print ''
        print ''

对于 13 的分配编号，此实现产生“完美”的解决方案：

assignments:
['6', '2', '3', '4', '2', '7', '1', '5', '6', '4', '5', '3', '1']
['5', '3', '7', '6', '5', '2', '6', '3', '1', '1', '2', '4', '4']
['1', '4', '2', '5', '4', '6', '3', '1', '7', '2', '6', '5', '3']
['3', '5', '4', '1', '6', '5', '2', '2', '4', '3', '1', '7', '6']
['7', '6', '1', '3', '3', '1', '4', '6', '5', '5', '4', '2', '2']
['4', '7', '6', '2', '1', '3', '5', '4', '2', '6', '3', '1', '5']
['2', '1', '5', '4', '2', '4', '5', '3', '3', '7', '6', '6', '1']
['5', '3', '6', '6', '4', '4', '7', '5', '3', '1', '2', '1', '2']
['3', '2', '4', '2', '5', '6', '4', '1', '1', '5', '7', '3', '6']
['4', '6', '5', '7', '1', '3', '1', '2', '4', '2', '3', '6', '5']
['2', '4', '1', '5', '3', '1', '2', '6', '6', '3', '4', '5', '7']
['1', '1', '3', '3', '6', '5', '6', '7', '2', '4', '5', '2', '4']
['6', '5', '2', '1', '7', '2', '3', '4', '5', '6', '1', '4', '3']

histogram:
  | A   B   C   D   E   F   G   H   I   J   K   L   M
--| - - - - - - - - - - - - - - - - - - - - - - - - - -
1 | 2   2   2   2   2   2   2   2   2   2   2   2   2
2 | 2   2   2   2   2   2   2   2   2   2   2   2   2
3 | 2   2   2   2   2   2   2   2   2   2   2   2   2
4 | 2   2   2   2   2   2   2   2   2   2   2   2   2
5 | 2   2   2   2   2   2   2   2   2   2   2   2   2
6 | 2   2   2   2   2   2   2   2   2   2   2   2   2
7 | 1   1   1   1   1   1   1   1   1   1   1   1   1

对于 24，这产生：

assignments:
['6', '1', '3', '4', '1', '5', '4', '5', '3', '2', '2', '7', '6']
['5', '2', '4', '6', '7', '3', '1', '3', '1', '4', '6', '2', '5']
['7', '5', '2', '3', '3', '4', '5', '6', '6', '1', '1', '4', '2']
['4', '3', '6', '5', '2', '6', '2', '4', '7', '3', '5', '1', '1']
['1', '4', '5', '1', '6', '2', '6', '2', '5', '7', '3', '3', '4']
['2', '6', '7', '2', '5', '1', '3', '1', '4', '6', '4', '5', '3']
['3', '7', '1', '2', '4', '1', '6', '3', '2', '5', '4', '6', '5']
['5', '6', '1', '1', '2', '6', '5', '7', '4', '3', '2', '4', '3']
['4', '1', '5', '7', '6', '3', '2', '4', '6', '1', '3', '5', '2']
['1', '3', '6', '4', '3', '2', '7', '2', '5', '5', '6', '1', '4']
['6', '4', '3', '6', '5', '5', '4', '1', '3', '2', '1', '2', '7']
['2', '5', '2', '3', '4', '4', '1', '5', '1', '6', '7', '3', '6']
['3', '2', '4', '5', '1', '7', '3', '6', '2', '4', '5', '6', '1']
['7', '5', '3', '6', '3', '1', '4', '2', '4', '5', '6', '2', '1']
['5', '1', '4', '2', '4', '2', '7', '6', '1', '3', '3', '5', '6']
['3', '7', '1', '4', '6', '5', '6', '1', '2', '2', '5', '3', '4']
['2', '2', '6', '1', '7', '4', '5', '3', '5', '6', '4', '1', '3']
['4', '3', '2', '5', '2', '6', '3', '4', '7', '1', '1', '6', '5']
['1', '6', '7', '3', '5', '3', '1', '5', '6', '4', '2', '4', '2']
['6', '4', '5', '4', '1', '1', '2', '5', '3', '7', '2', '6', '3']
['6', '5', '1', '3', '2', '6', '2', '3', '4', '4', '5', '1', '7']
['5', '1', '2', '6', '4', '3', '3', '6', '2', '5', '4', '7', '1']
['2', '3', '5', '1', '6', '2', '1', '4', '5', '3', '7', '4', '6']
['3', '6', '4', '2', '1', '5', '4', '7', '3', '6', '1', '5', '2']

histogram:
  | A   B   C   D   E   F   G   H   I   J   K   L   M
--| - - - - - - - - - - - - - - - - - - - - - - - - - -
1 | 3   4   4   4   4   4   4   3   3   3   4   4   4
2 | 4   3   4   4   4   4   4   3   4   3   4   3   4
3 | 4   4   3   4   3   4   4   4   4   4   3   3   4
4 | 3   3   4   4   4   3   4   4   4   4   4   4   3
5 | 4   4   4   3   3   4   3   4   4   4   4   4   3
6 | 4   4   3   4   4   4   3   4   3   4   3   4   4
7 | 2   2   2   1   2   1   2   2   2   2   2   2   2

而对于 26，这产生了另一个完美的解决方案：

assignments:
['5', '1', '1', '6', '7', '6', '4', '5', '2', '4', '2', '3', '3']
['1', '2', '4', '4', '5', '7', '5', '2', '1', '3', '3', '6', '6']
['3', '5', '6', '3', '1', '2', '2', '4', '5', '7', '6', '4', '1']
['2', '3', '5', '2', '4', '1', '1', '6', '3', '6', '4', '5', '7']
['6', '4', '2', '1', '3', '4', '3', '1', '6', '5', '7', '2', '5']
['4', '6', '7', '5', '2', '3', '6', '3', '4', '1', '5', '1', '2']
['6', '2', '3', '5', '6', '5', '3', '7', '1', '2', '1', '4', '4']
['5', '5', '6', '2', '1', '2', '7', '4', '3', '1', '6', '3', '4']
['1', '4', '1', '7', '3', '6', '2', '3', '6', '4', '5', '2', '5']
['4', '1', '5', '3', '6', '3', '4', '1', '7', '6', '2', '5', '2']
['2', '7', '2', '1', '4', '1', '5', '6', '4', '5', '3', '6', '3']
['7', '3', '3', '6', '2', '4', '1', '5', '5', '2', '4', '1', '6']
['3', '6', '4', '4', '5', '5', '6', '2', '2', '3', '1', '7', '1']
['4', '3', '2', '5', '6', '5', '1', '4', '3', '2', '6', '7', '1']
['6', '5', '4', '2', '5', '7', '3', '1', '2', '1', '3', '4', '6']
['1', '4', '6', '6', '2', '2', '7', '3', '5', '3', '4', '1', '5']
['5', '2', '1', '4', '1', '6', '5', '7', '4', '6', '2', '3', '3']
['2', '1', '5', '3', '4', '3', '2', '6', '1', '4', '5', '6', '7']
['3', '6', '7', '1', '3', '4', '4', '5', '6', '5', '1', '2', '2']
['1', '2', '3', '3', '4', '1', '6', '2', '5', '7', '6', '5', '4']
['6', '3', '1', '5', '6', '2', '1', '4', '7', '3', '5', '4', '2']
['3', '4', '4', '1', '7', '6', '5', '3', '2', '6', '2', '5', '1']
['7', '6', '3', '6', '5', '5', '4', '2', '1', '4', '1', '2', '3']
['2', '7', '6', '2', '1', '3', '6', '5', '3', '5', '4', '1', '4']
['5', '1', '5', '4', '3', '4', '2', '1', '6', '2', '7', '3', '6']
['4', '5', '2', '7', '2', '1', '3', '6', '4', '1', '3', '6', '5']

histogram:
  | A   B   C   D   E   F   G   H   I   J   K   L   M
--| - - - - - - - - - - - - - - - - - - - - - - - - - -
1 | 4   4   4   4   4   4   4   4   4   4   4   4   4
2 | 4   4   4   4   4   4   4   4   4   4   4   4   4
3 | 4   4   4   4   4   4   4   4   4   4   4   4   4
4 | 4   4   4   4   4   4   4   4   4   4   4   4   4
5 | 4   4   4   4   4   4   4   4   4   4   4   4   4
6 | 4   4   4   4   4   4   4   4   4   4   4   4   4
7 | 2   2   2   2   2   2   2   2   2   2   2   2   2

请注意，大部分随机性是通过选择作为分配的初始排列来注入的。之后，问题主要是确定性的，随机性要少得多。尽管如此，这个实现注入了少量的随机性，通过使用rand_wgts这些随机性在每个分配中给出随机（次要）偏好。

python - 组之间的平衡洗牌

问题描述

解决方案

推荐阅读