首页 > 解决方案 > 如何:仅使用几个字典进行多次迭代的多进程?

问题描述

好的,这对我来说有点难以解释,但我会尽量说清楚:

假设有一个像下面这样的字典(忽略荒谬,使用随机名称/食物生成器):

all_data = {"albina":"broth", "carlota":"vanilla", "masako":"garlic powder", "latoyia":"hamburger", "earl":"broccoli", "kristen":"raw sugar", "ione":"mustard", "chauncey":"rabbits", "jolie":"cantaloupes", "carina":"onion powder", "larae":"condensed milk",
"myriam":"asiago cheese", "christal":"coconut oil", "roselia":"black beans", "arletta":"red snapper", "marketta":"grapefruits", "sheryll":"navy beans", "scot":"arugula", "fernando":"chocolate", "bernice":"lamb", "libby":"cloves", "blanca":"cider",
"antonette":"sweet chili sauce", "dena":"chickpeas", "ja":"graham crackers", "kathe":"provolone", "jon":"lemons", "elicia":"feta cheese", "jeanette":"turkeys", "regan":"bok choy", "sabrina":"panko bread crumbs", "salvatore":"liver", "natalie":"breadfruit", "kathie":"chili powder", 
"lorretta":"cider vinegar", "colby":"date sugar", "shirly":"pistachios", "bret":"bouillon", "cira":"artichokes", "larry":"macaroni", "reena":"mesclun greens", "charla":"parsley", "lilla":"Kahlua", "erick":"cannellini beans", "esteban":"sushi", "na":"prosciutto", "wilhelmina":"pheasants", 
"dorinda":"scallops", "marvin":"salt", "madison":"dates"}

然后假设我有一个感兴趣的人的名单:

people_of_interest = ["albina", "earl", "elicia", "jeanette", "madison"]

现在假设我想计算每个人最喜欢的食物出现的字母频率。很简单,我们只写几个函数:

def food_associator(names_list):
    assoc_food = {}
    for name in names_list:
        assoc_food[name] = all_data[name]
    return assoc_food

def food_con_letters(data_dict):
    consus_fLetters = {}
    for name in data_dict:
        for food in data_dict[name]:
            for letter in food:
                if letter not in consus_fLetters:
                    consus_fLetters[letter] = 0
                if letter in consus_fLetters:
                    consus_fLetters[letter] += 1
    return consus_fLetters

food_interest = food_associator(people_of_interest)
food_con_int = food_con_letters(food_interest)

print(food_con_int)

{'b':2,'r':3,'o':3,'t':4,'h':2,'c':3,'l':1,'i':1,' f':1,'e':6,'a':2,'':1,'s':3,'u':1,'k':1,'y':1,'d': 1}

好的。现在假设我想将这些字母的频率与随机选择的 5 个人进行比较。为此,我只需运行random.sample,很容易。我的随机选择也将存储在字典中,其中的字母显示为random_select_dict.

我遇到的问题是,假设我想做 100 次随机抽样,然后将每个字母的平均频率与food_con_int数据进行比较。这基本上会将感兴趣的人最喜欢的食物的字母频率与随机选择的相同数量的人的“正态分布”进行比较。并且随着每次迭代,random_select_dict被重新制作并且字母的频率被附加到我们将调用的新字典中average_random

现在这对于我提供的数据集可能还不错,但是在我的实际数据中,我想使用更多包含大型字典的字典进行 10000 到 50000 次随机抽样。有什么方法可以实现多处理并使用多核进行迭代随机选择,这是我目前的瓶颈?甚至可以打开一个字典并跨进程附加到另一个字典而不会损坏数据?

或者,有人可以描述一种不同的方法,我可以使用它来多处理字典中的迭代随机抽样数据。随机选择的数据在最终附加到average_random字典之前,会经过多个函数对其进行处理。

我希望这很清楚!

标签: pythonpython-3.xdictionary

解决方案


推荐阅读