首页 > 解决方案 > 带返回值的多处理

问题描述

我有一个问题,将多处理以加快对存储在 S3 上需要检查的文件的一些处理。因为我现在是使用多处理的新手,所以我不确定当我只使用 for 循环时,代码运行时没有发布到底有什么问题。

def read_json(file):
  file_key = file["Key"]
  file_key_split = file_key.split("/")
  document = get_json_details(file_key)
  type = file_key_split[2]  
return document, type

document_list = []
document_type_list = []

mgr = mp.Manager()
nodes = mgr.list()
pool_size = mp.cpu_count()
pool = mp.Pool(processes=pool_size)
# mp.freeze_support()

for file in tqdm(get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)):
    document_list, document_type_list = zip(*pool.map(read_json, file))

pool.close()
pool.join()

我得到的错误如下:

"""
Traceback (most recent call last):
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\GIT\BMWJPSI-BI\03_Lambda_Functions\RegoOCRCheck.py", line 118, in read_json
    file_key = file["Key"]
TypeError: string indices must be integers
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:/GIT/BMWJPSI-BI/03_Lambda_Functions/RegoOCRCheck.py", line 151, in <module>
    document_list, document_type_list = zip(pool.map(read_json, file))
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
TypeError: string indices must be integers```

Thanks for your help.

标签: pythonpython-3.xmultiprocessing

解决方案


抱歉延迟响应,我认为您遇到的问题是您将字典对象传递给pool.map函数,该函数只会遍历字典的键而不是传递字典对象本身。我认为不是遍历每个人file并运行pool.map,您应该尝试将整个传递给将被迭代并作为每个元组所在的元组列表返回get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)的函数pool.map(document_list, document_type_list)

document_list, document_type_list = zip(*pool.map(read_json, get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)))

如果您仍然遇到任何问题,请告诉我


推荐阅读