首页 > 解决方案 > 如何有效地合并 3 个或更多字典

问题描述

合并到字典的最有效方法似乎是使用 dict 解包:

def merge_dicts(x, y)
    return {**x, **y}

但是如果你有两个以上的字典怎么办?如果您不知道您将拥有多少个字典怎么办?

以下内容不受支持,会产生语法错误(请参阅 PEP-448 变体部分):

def merge_dicts(*dicts):
    return {**d for d in dicts}

这个替代方案几乎是可读的并且提供了相当好的性能:

def merge_dicts_comprehension(*dicts):
    return {k: v for d in dicts for k, v in d.items()}

但是使用解包仍然更快。例如,如果你知道len(dicts)==5,这会更快:

def merge_dicts_unpack(*dicts):
    return {
        **dicts[0],
        **dicts[1],
        **dicts[2],
        **dicts[3],
        **dicts[4],
    }

但是动态怎么做呢?

这是一种方法,但它很可怕:

def merge_dicts_dynamic(*dicts):
    num = len(dicts)
    func_name = f"_merge_{num}"
    try:
        func = globals()[func_name]
    except KeyError:
        spread = ', '.join([f"**d[{x}]" for x in range(num)])
        func_text = f'def {func_name}(*d):\n    '
        func_text += f'return {{{spread}}}'
        foo_code = compile(func_text, "<string>", "exec")
        func = FunctionType(foo_code.co_consts[0], globals(), func_name)
        globals()[func_name] = func
    return func(*dicts)

它击败了{k: v for d in dicts for k, v in d.items()}方法。但又一次,可怕。

我们如何在不动态生成函数的情况下达到与解包相当的性能?

其他变体:

def merge_dicts_unpack_iter(*dicts):
    val = {}
    for d in dicts:
        val = {**val, **d}
    return val

回答者变体:

来自@jizhihaoSAMA:

def merge_dicts_reduce(*dicts):
    return reduce(lambda x, y: {**x, **y}, dicts)

来自@RonSerruya:

def merge_dicts_update(*dicts):
    d = dict()
    for x in dicts:
        d.update(x)
    return d

测试数据:

test_dicts = [{str(x): x for x in range(num, num + 20)} for num in [20, 40, 80, 100, 120]]

测试代码:

for test_func in [
    'comprehension',
    'unpack_iter',
    'dynamic',
    'reduce',
    'update',
    'unpack',
]:
    result = timeit.timeit(
        stmt=f'merge_dicts_{test_func}(*test_dicts)',
        number=1000000,
        globals=globals(),
    )
    print(f"{test_func}: {result}")

结果:

comprehension: 7.904960700000629
unpack_iter: 7.860937651999848
dynamic: 3.4844029209998553
reduce: 8.41494683899964
update: 3.8839738759998
unpack: 3.184907333999945

结论:

我同意 Ron 的观点,即更新版本提供了可读性和性能的最佳折衷方案。

它的性能不如静态解包或动态解包,但非常接近。

标签: pythonpython-3.x

解决方案



In [1]: d1 = {x:1 for x in range(100)}

In [2]: d2 = {x:1 for x in range(200,300)}

In [3]: d3 = {x:1 for x in range(350,500)}

In [4]: d4 = {x:1 for x in range(400,500)}

In [5]: d5 = {x:1 for x in range(600,720)}

In [9]: def merge_dicts(dicts):
   ...:     return {**dicts[0],
   ...:     **dicts[1],
   ...:     **dicts[2],
   ...:     **dicts[3],
   ...:     **dicts[4]}
   ...:

In [10]: %timeit merge_dicts(dicts)
12.7 µs ± 53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [11]: def merge_update(dicts):
    ...:     d = dict()
    ...:     for x in dicts:
    ...:         d.update(x)
    ...:     return d
    ...:

In [12]: %timeit merge_update(dicts)
13.3 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: def merge_items(dicts):
    ...:     return {k: v for d in dicts for k, v in d.items()}
    ...:

In [14]: %timeit merge_items(dicts)
32.9 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

.update 的第二种方法看起来足够好


推荐阅读