首页 > 解决方案 > 返回具有半随机排名的加权对象列表

问题描述

假设我有一个看起来像这样的对象列表(在 Python 中)(包含标识符和排名/权重):

objects = [
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_3", 0.25),
    ("object_4", 0.01),
    ("object_5", 0.99),
]

我想返回这个相同objects的数组,但它们的权重是半随机的。也就是说,我并不总是想返回:

[
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

但宁愿允许一些不确定性,这样,一般来说,返回的数组看起来像上面,但也可能看起来像:

[
    ("object_5", 0.99),
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_4", 0.01),
    ("object_3", 0.25),
]

编辑:我我问的是与这个不同的问题,因为这里的排序很重要;在另一个问题中,顺序无关紧要(再次,我认为!)。

标签: pythonlistrandomrankingweighted

解决方案


如果我没记错的话,一种方法可能是加权样本而不进行替换:

from random import choices


def weighted_sample_without_replacement(population, weights, k=1):
    #    https://stackoverflow.com/a/43649323/4001592
    weights = list(weights)
    positions = range(len(population))
    indices = []
    while True:
        needed = k - len(indices)
        if not needed:
            break
        for i in choices(positions, weights, k=needed):
            if weights[i]:
                weights[i] = 0.0
                indices.append(i)
    return [population[i] for i in indices]


data = [
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

_, weights = zip(*data)
sample = weighted_sample_without_replacement(data, weights, k=len(data))
print(sample)

输出 (单次运行)

[('object_2', 0.75), ('object_5', 0.99), ('object_3', 0.25), ('object_1', 0.5), ('object_4', 0.01)]

一个基本的实验分析似乎验证了我的假设:

from collections import defaultdict
from operator import itemgetter

_, weights = zip(*data)
counts = defaultdict(lambda : defaultdict(int))
for _ in range(1000):
    sample = weighted_sample_without_replacement(data, weights, k=len(data))
    for i, (key, _) in enumerate(sample):
        counts[i][key] += 1

for key, values in counts.items():
    print(key, sorted(values.items(), key=itemgetter(1), reverse=True))

输出 (实验)

0 [('object_5', 415), ('object_2', 290), ('object_1', 186), ('object_3', 106), ('object_4', 3)]
1 [('object_2', 322), ('object_5', 309), ('object_1', 241), ('object_3', 119), ('object_4', 9)]
2 [('object_1', 319), ('object_2', 259), ('object_3', 209), ('object_5', 199), ('object_4', 14)]
3 [('object_3', 533), ('object_1', 239), ('object_2', 126), ('object_5', 75), ('object_4', 27)]
4 [('object_4', 947), ('object_3', 33), ('object_1', 15), ('object_2', 3), ('object_5', 2)]

该值'object_5'在 1000 次中的 724 次中位于前两个位置,而在 1000'object_4'次中的 947 次中位于最后位置。为了更好地可视化结果,请参见下图(可视化是通过额外运行实验生成的设置):

按对象分配位置

可以在此处找到重现实验的代码。


推荐阅读