首页 > 解决方案 > Python 多处理具有极长的开销

问题描述

我正在将多个图像读入它们自己的 numpy 数组中。它们都是相同的大小并且不共享数据,所以我认为使用多处理是要走的路。但是,当我去读取图像时,由于某种原因加载它们后,后端有很长的等待时间。

这是我正在调用的函数:

def image_to_matrix_mp(image_str):
    """
    Reads in image files and stores them in a numpy array. Doing operations on an array and then writing to the image
    is faster than doing them directly on the image. This uses multiprocessing to read each file simultaneously.
    :param image_str:
    :return: array
    """
    pic = Image.open(INPUT_FILES_DIR + image_str)
    image_size = pic.size
    array = np.empty([image_size[0], image_size[1]], dtype=tuple)
    for x in range(image_size[0]):
        if x % 1000 == 0:
            print("x = %d, --- %s seconds ---" % (x, time.time() - start_time))
        array_x = array[x]
        for y in range(image_size[1]):
            array_x[y] = pic.getpixel((x,y))

    return array

这就是我所说的:

def main():
    start_time = time.time()

    p = multiprocessing.Pool(processes=5)
    [land_prov_array, areas_array, regions_array, countries_array, sea_prov_array] = p.map(POF.image_to_matrix_mp, 
                                                  ['land_provinces.bmp',
                                                   'land_areas.bmp',
                                                   'land_regions.bmp',
                                                   'countries.bmp',
                                                   'sea_provinces.bmp'])
    p.close()


    width = len(land_prov_array)     # Width dimension of the map
    height = len(land_prov_array[0])     # Height dimension of the map

    print("All images read in --- %s seconds ---" % (time.time() - start_time))

我打印出矩阵的每 1,000 列,然后打印出读取进程以进行调试所需的总时间。这是输出的结尾:

x = 15000, --- 84.4389169216156 seconds ---
x = 15000, --- 84.94356632232666 seconds ---
x = 15000, --- 85.07920360565186 seconds ---
x = 15000, --- 85.1400408744812 seconds ---
x = 15000, --- 85.99774622917175 seconds ---
x = 16000, --- 89.95117163658142 seconds ---
x = 16000, --- 90.62337279319763 seconds ---
x = 16000, --- 90.62437009811401 seconds ---
x = 16000, --- 90.76798582077026 seconds ---
x = 16000, --- 91.90195274353027 seconds ---
All images read in --- 275.9242513179779 seconds ---

这些图像为 16,200 x 6,000,正如您在此处看到的,所有 5 个图像都在前一百秒内被读入。但是,代码需要另外 175 秒才能继续运行,这意味着结束多处理所需的时间几乎是它正在运行的实际函数的两倍。这是多处理的正常开销还是我做错了什么?

标签: pythonnumpymultiprocessingpython-imaging-library

解决方案


推荐阅读