首页 > 解决方案 > 如何使用多处理获取数字列表中的最大数字

问题描述

我有一个随机数列表,我想使用multiprocessing获得最大的数字。

这是我用来生成列表的代码:

import random
randomlist = []
for i in range(100000000):
    n = random.randint(1,30000000)
    randomlist.append(n)

要使用串行过程获得最大数量:

import time

greatest = 0 # global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest

    t2 = time.time()
    greatest = 0

    for x in randomlist:
        f(x)    
    
    print("serial process took:", time.time()-t2)
    print("greatest = ", greatest)

这是我尝试使用多处理获得最大数量:

from multiprocessing import Pool
import time

greatest = 0 # the global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest
    greatest = 0
    t1 = time.time()
    p = Pool() #(processes=3) 
    result = p.map(f,randomlist)
    p.close()
    p.join()
    print("pool took:", time.time()-t1)
    print("greatest = ", greatest)

这里的输出是0。很明显没有全局变量。如何在不影响性能的情况下解决此问题?

标签: pythonmultithreadingmultiprocessing

解决方案


正如@Barmar 所建议的,将你randomlist分成块然后处理每个块的局部最大值,最后计算全局最大值local_maximum_list

import multiprocessing as mp
import numpy as np
import random
import time

CHUNKSIZE = 10000

def local_maximum(l):
    m = max(l)
    print(f"Local maximum: {m}")
    return m

if __name__ == '__main__':
    randomlist = np.random.randint(1, 30000000, 100000000)

    start = time.time()
    chunks = (randomlist[i:i+CHUNKSIZE]
                  for i in range(0, len(randomlist), CHUNKSIZE))

    with mp.Pool(mp.cpu_count()) as pool:
        local_maximum_list = pool.map(local_maximum, chunks)
    print(f"Global maximum: {max(local_maximum_list)}")
    end = time.time()
    print(f"MP Elapsed time: {end-start:.2f}s")

表现

随机列表的创建如何影响多处理的性能非常有趣

Scenario 1:
randomlist = np.random.randint(1, 30000000, 100000000)
MP Elapsed time: 1.63s

Scenario 2:
randomlist = np.random.randint(1, 30000000, 100000000).tolist()
MP Elapsed time: 6.02s

Scenario 3
randomlist = [random.randint(1, 30000000) for _ in range(100000000)]
MP Elapsed time: 7.14s

Scenario 4:
randomlist = list(np.random.randint(1, 30000000, 100000000))
MP Elapsed time: 184.28s

Scenario 5:
randomlist = []
for _ in range(100000000):
    n = random.randint(1, 30000000)
    randomlist.append(n)
MP Elapsed time: 7.52s

推荐阅读