首页 > 解决方案 > Python:Windows 上的多处理 -> 共享只读内存

问题描述

有没有一种方法可以在不复制整个内存的情况下将一个巨大的字典共享给 windows 上的多处理子进程?如果有帮助,我只需要在子流程中只读它。

我的程序大致如下所示:

def workerFunc(args):
    id, data_mp, some_more_args = args

    # Do some logic
    # Parse some files on the disk
    # and access some random keys from data_mp which are only known after parsing those files on disk ...
    some_keys = [some_random_ids...]

    # Do something with 
    do_something = [data_mp[x] for x in some_keys]
    return do_something


if __name__ == "__main__":
    multiprocessing.freeze_support()    # Using this script as a PyInstalled .exe later on ...

    DATA = readpickle('my_pickle.pkl')   # my_pickle.pkl is huge, ~1GB
    # DATA looks like this:
    # {1: ['some text', SOME_1D_OR_2D_LIST...[[1,2,3], [123...]]], 
    #  2: ..., 
    #  3: ..., ..., 
    #  1 million keys... }

    # Here I'm doing something with DATA in the main programm...

    # Then I want to spawn N multiprocessing subprocesses, each doing some logic and than accessing a few keys of DATA to read from ...

    manager = multiprocessing.Manager()
    data_mp = manager.dict(DATA)    # Right now I'm putting DATA into the shared memory... so it effectively duplicates the required memory...

    joblist = []
    for idx in range(10000): # Generate the workers, pass the shared memory link data_mp to each worker later on ...
        joblist.append((idx, data_mp, some_more_args))

    # Start Pool of Procs... 
    p = multiprocessing.Pool()
    returnNodes = []
    for ret in p.imap_unordered(workerFunc, jobList):
       returnNodes.append(ret)

    # Do some after work with DATA and returnNodes...
    # and generate some overview xls-file out of it

不幸的是,没有其他方法可以保存我的大字典...我知道 SQL 数据库会更好,因为每个工作人员只访问他的子进程中的几个 DATA_mp 键,但我事先不知道每个键将被寻址工人。

所以我的问题是:Windows 上是否还有其他方法可以做到这一点,而不是使用 Manager.dict() ,如上所述,它可以有效地复制所需的内存?

谢谢!

编辑 不幸的是,在我的公司环境中,我的工具不可能使用 SQL DB,因为没有可用的专用机器。我只能在网络驱动器上基于文件工作。我已经尝试过 SQLite,但它非常慢(即使我不明白为什么......)。是的,它是 DATA 中一个简单的键-> 值类型的字典...

并使用 Python 2.7!

标签: pythondictionarymultiprocessingpicklepython-multiprocessing

解决方案


推荐阅读