python - Python:Windows 上的多处理 -> 共享只读内存
问题描述
有没有一种方法可以在不复制整个内存的情况下将一个巨大的字典共享给 windows 上的多处理子进程?如果有帮助,我只需要在子流程中只读它。
我的程序大致如下所示:
def workerFunc(args):
id, data_mp, some_more_args = args
# Do some logic
# Parse some files on the disk
# and access some random keys from data_mp which are only known after parsing those files on disk ...
some_keys = [some_random_ids...]
# Do something with
do_something = [data_mp[x] for x in some_keys]
return do_something
if __name__ == "__main__":
multiprocessing.freeze_support() # Using this script as a PyInstalled .exe later on ...
DATA = readpickle('my_pickle.pkl') # my_pickle.pkl is huge, ~1GB
# DATA looks like this:
# {1: ['some text', SOME_1D_OR_2D_LIST...[[1,2,3], [123...]]],
# 2: ...,
# 3: ..., ...,
# 1 million keys... }
# Here I'm doing something with DATA in the main programm...
# Then I want to spawn N multiprocessing subprocesses, each doing some logic and than accessing a few keys of DATA to read from ...
manager = multiprocessing.Manager()
data_mp = manager.dict(DATA) # Right now I'm putting DATA into the shared memory... so it effectively duplicates the required memory...
joblist = []
for idx in range(10000): # Generate the workers, pass the shared memory link data_mp to each worker later on ...
joblist.append((idx, data_mp, some_more_args))
# Start Pool of Procs...
p = multiprocessing.Pool()
returnNodes = []
for ret in p.imap_unordered(workerFunc, jobList):
returnNodes.append(ret)
# Do some after work with DATA and returnNodes...
# and generate some overview xls-file out of it
不幸的是,没有其他方法可以保存我的大字典...我知道 SQL 数据库会更好,因为每个工作人员只访问他的子进程中的几个 DATA_mp 键,但我事先不知道每个键将被寻址工人。
所以我的问题是:Windows 上是否还有其他方法可以做到这一点,而不是使用 Manager.dict() ,如上所述,它可以有效地复制所需的内存?
谢谢!
编辑 不幸的是,在我的公司环境中,我的工具不可能使用 SQL DB,因为没有可用的专用机器。我只能在网络驱动器上基于文件工作。我已经尝试过 SQLite,但它非常慢(即使我不明白为什么......)。是的,它是 DATA 中一个简单的键-> 值类型的字典...
并使用 Python 2.7!
解决方案
推荐阅读
- macos - 为什么在 Mac 上 Angular 安装问题和权限失败?
- arrays - 将变量列表转换为数组
- python - TF 2.0 while_loop 和 parallel_iterations
- amazon-web-services - 是否可以从 AWS S3 批量下载?
- python - 根据请求触发 VS Code python 调试器
- java - Tika - 内存不足异常
- ios - 覆盖 UIView 的 UITraitCollection
- python - 模糊匹配加入两个数据框
- php - 如何更改已经创建的表?
- flutter - CircularProgressIndicator 在模拟器或 iOS 模拟器上导致 100% CPU