首页 > 解决方案 > Python中函数的并行执行

问题描述

下面是并行代码。当我运行时出现错误 - UnboundLocalError: local variable 'df_project' referenced before assignment 我不知道我做错了什么。当我运行与常规函数相同的代码时,它工作正常。

任何输入都会有很大帮助

from multiprocessing import Pool

def square(x):
    # calculate the square of the value of x
    v=x['content'][0]['template']['module']
    if isinstance(v, list):
        for i, v2 in enumerate(v): 
            df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list = normalizeJSON(x,i,v2['id'])
    else:
        print('module is not a list')

    return df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list

if __name__ == '__main__':

    # Define the dataset
    dataset = result_list

    # Run this with a pool of 5 agents having a chunksize of 3 until finished
    agents = 5
    chunksize = 3
    project=pd.DataFrame()
    module=pd.DataFrame()
    content_module_columns=["module_id","module_text", "project_id", "project_revision","index"]  
    dim_content_module=pd.DataFrame(columns = content_module_columns)

    with Pool(processes=agents) as pool:
         project,module, module_header,module_paragraph,module_image,module_chart,module_chart_row,module_list=pool.map(square,dataset,chunksize)


Below is the normal (serial) version of the code that Im trying to parallelize 

    for index in range(len(result_list)):
        print('processing file number:', index)
        d=result_list[index]
        v=d['content'][0]['template']['module']
        if isinstance(v, list):
           for i, v2 in enumerate(v): 
               df_project,df_module = normalizeJSON(d,i,v2['id'])
               dim_content_module=dim_content_module.append(df_module, 
                                  ignore_index=True,sort=False)
        else:
            print('module is not a list')

I dont get any error in the serial version with the same input. 

*result_list* is a list of dictionaries

标签: pythonpython-multithreading

解决方案


如果没有看到完整的堆栈跟踪,很难猜测,但很可能您的问题出在此函数上:


def square(x):
    # calculate the square of the value of x
    v=x['content'][0]['template']['module']
    if isinstance(v, list):
        for i, v2 in enumerate(v): 
            df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list = normalizeJSON(x,i,v2['id'])
    else:
        print('module is not a list')

    return df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list

check最有可能在其中一次运行中isinstance返回False,并且当return您尝试使用它抛出的所有这些对象时,UnboundLocalError因为实际上您正在引用从未分配过的变量


推荐阅读