首页 > 解决方案 > Sub-dictionary from dictionary. Create another dictionary with dict comprehension, or delete keys from a copy of the original dictionary

问题描述

I have a large dictionary (+3000 key-values) and I need to constantly generate sub-dictionaries from the original dictionary (there is no way around it). Each Sub-dictionary contains around 300-600 key-values most of the times.

I am wondering what is more time efficient (memory efficiency is not important here):

I guess Option 2 makes more sense, since we are choosing a few key-values (300-600), instead of deleting a lot of key-values (+2000), but I dont know what is going on in the background in both cases, and there are counterintuitive cases out there.

Thank you

标签: pythondictionary

解决方案


对两个答案的一点反馈;恕我直言,这也不是准确的,因为他们选择键的方式。

如果您的数据以您可能不需要字典(而是排序列表)的方式设置,他们有效地将字典视为列表并保留 n 项。

关于选项 2

我敢打赌你正在执行这样的查找:

new_dict = {k:v for k,v in parent_dict.items() if k in selections}

在这种情况下,性能会随着len(selections)增加而降低;在某种程度上减轻这种情况的一种方法是使用集合而不是列表进行选择;有效O(m*n)(其中 m 是 parent_dict 的大小,n 是选择的数量)

因为 n 上的循环并不总是需要在完整的 n 上循环,它可能更接近1 和O((n**2-n)/2)n(n**2-2)/2之间所有数字的总和。

另一方面,选项 1中删除的循环是O(2m-n).

O(m)用于复制字典和O(m-n)删除密钥

有一些非香草(我想到了 lodash)解决方案经过高度优化并且可能会更快,但严格考虑时间复杂度,使用现实的键选择似乎应该更快地删除。

哪个选项最好可能取决于m&n


我刚刚有一个金发碧眼的时刻,但想把我的无知留给任何其他未来的路人

你应该做这个: new_dict = {k: parent_dict[k] for k in selections}

您无需O(m)为复制 dict、将项目添加到 dictO(1)并循环选择内容而付出代价,O(n)因此您可以坐在O(2n)ie上O(n)——我认为没有什么比这更快了。


好的,下面是代码:

import random
import timeit

m = 3_000
n0 = 300
n1 = 600

trials = 10_000

def remove(parent_dict, kept_keys):
    result = parent_dict.copy()
    for key in parent_dict.keys():
        if key not in kept_keys:
            del result[key]
    return result

def add_dict_comprehension(parent_dict, kept_keys):
    return {k:v for k,v in parent_dict.items() if k in kept_keys}

def add_key_comprehension(parent_dict, kept_keys):
    return {k:parent_dict[k] for k in kept_keys}

def add_dict_comprehension_set(parent_dict, kept_keys):
    return {k:v for k,v in parent_dict.items() if k in kept_keys}

def generate_test_data(m, n0, n1):
    parent_dict = {}
    for key in range(m):
        parent_dict[key] = 0

    target_keys = random.sample(range(m), random.randint(n0, n1))

    return parent_dict, target_keys

test_functions = {
    'remove': (remove, lambda x: x),
    'add_dict_comprehension': (add_dict_comprehension, lambda x: x),
    'add_key_comprehension': (add_key_comprehension, lambda x: x),
    'add_dict_comprehension_set': (add_dict_comprehension, lambda keys: set(keys)),
    'remove_set': (remove, lambda keys: set(keys)),
}

results = {k:[] for k in test_functions.keys()}

for _ in range(trials):
    parent_dict, target_keys = generate_test_data(m, n0, n1)


    for k,(func, preprocessor) in test_functions.items():

        # Not timing the preprocessing, assuming this is done on construction
        processed_keys = preprocessor(target_keys)
        
        starttime = timeit.default_timer()
        func(parent_dict, processed_keys)
        results[k].append(timeit.default_timer() - starttime)


def avg(lst):
    return sum(lst)/len(lst)

def stddev(lst):
    mean = avg(lst)
    r2 = sum([(i-mean)**2 for i in lst])
    bias = len(lst) - 1
    return (r2/bias)**.5

result_data = {k: {'avg': avg(v), 'stddev': stddev(v)} for k,v in results.items()}

title_len = max(len(i) for i in result_data.keys())
for (name, data) in sorted(result_data.items(), key = lambda item: item[1]['avg']):
    print(f"{name}: {''.join([' ']*(title_len-len(name)))} Avg. {data['avg']} [Stddev: {data['stddev']}]")

正如预期的那样,键理解是最快的,并且对于选项 1 和 2 来说,使用集合而不是列表更快)——集合的平均查找时间为 ,O(1)因此您可以避免数组的大部分搜索成本

add_key_comprehension:       Avg. 8.616499952040612e-05 [Stddev: 4.227672839185718e-05]
add_dict_comprehension_set:  Avg. 0.0001553589993272908 [Stddev: 3.535997011243502e-05]
remove_set:                  Avg. 0.00021654199925251306 [Stddev: 0.00017905888806226246]
add_dict_comprehension:      Avg. 0.013669110001646913 [Stddev: 0.0031769500128693085]
remove:                      Avg. 0.0138856839996879 [Stddev: 0.004191406076651088]

推荐阅读