首页 > 解决方案 > Random seed for order of elements in Python's Set to List conversion

问题描述

I was executing some code in a Jupyter notebook and noticed that each time I ran it, the output was different despite not explicitly putting randomness in my program.

I narrowed it down to a line that removes all repeated elements from a list.

l = list(set(l))

I noticed two things:

Is there some kind of hidden random seed that is used for the set -> list conversion for a given kernel? How does it work under the hood, and what would I do if I wanted deterministic output from the above code?

标签: pythonrandomset

解决方案


A 的set功能几乎与 相同dict,以hash对象的 为键。__hash__大多数对象(在 CPython 中)的默认函数依赖于它们的id,而后者又依赖于它们在内存中的地址。

新内核意味着对象具有不同的地址,这意味着集合给出的迭代器的不同id、不同和不同的顺序。hash

这是依赖于实现的,所以你不能依赖它,我只能说 CPython,到目前为止,它是这样工作的。您可以依赖的东西set不是(有用的)订购。

如果您需要订购,请保留清单和套装。如果您想在保留顺序的同时删除重复,则可以使用以下方法:

def could_add(s, x):
    if x in s:
        return False
    else:
        s.add(x)
        return True

seen = set()
[x for x in l if could_add(seen, x)]

(虽然我完全同意 Barmar 的评论——如果顺序很重要,它们应该是可排序的。)


推荐阅读