首页 > 解决方案 > 如何加快python中非线性优化的多次迭代?

问题描述

我正在尝试使用PyGMO 包解决非线性优化 优化类是单独定义的,然后通过单独的函数调用dyn_optimGMO。必须为变量(节点)定义的 1000 个随机初始向量进行此优化并保存inits ( or init_val)

使用timeit模块我发现每次迭代大约17 seconds 需要完成。这意味着大约5 hours 需要1000 iterations. 这是一个非常大的时间。

如果我必须对节点重复此操作,20 perturb那么总迭代次数200000将按照上面计算的线性时间进行。

我尝试通过使用 pythonmultiprocessing模块为 20 个扰动节点中的每一个并行化每组 1000 次迭代来解决这个问题。但这无济于事。

我也尝试使用 Numba jit 函数,但它们无法识别 pyGMO 模块,因此失败。

有什么方法可以并行化这段代码并使其在任意数量的迭代中更快?

请让我知道我的问题是否足够清楚,如果没有,我将根据需要添加详细信息。

import numpy as np
import pygmo as pg

matL = np.random.rand(300,300) ; node_len = 300

inits = []; results = []


perturb = {25:0} #setting a random node, say, node 25 to 0

class my_constrained_udp:
    
    def __init__(self):
        pass
    
    def fitness(self, x):
        matA = np.matrix(x)
        obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
        ce1 = sum(init_val) - sum(x)                   
        return [obj1, ce1]
   
    def n_objs(self): # no of objectives
        return 1


    def get_nec(self): #no of equality constraints
        return 1   

 
    def get_nic(self): #no of in-equality constraints
        return 0                    


    def get_bounds(self): #lower and upper bounds: use this to perturb the node
        lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
        if perturb:
            for k,v in perturb.items():
                lowerB[k] = v; upperB[k] = v
        return (lowerB,upperB)

  
    def gradient(self, x):
        return pg.estimate_gradient_h(lambda x: self.fitness(x), x)


def dyn_optimGMO(matL, node_len ,init):
        
    if perturb:
        for k,v in perturb.items(): init_val[k] = v  #setting perturbations in initial values
    
    inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
    
    algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
    #algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
    pop = pg.population(prob = my_constrained_udp(), size = 1000 , seed=123)
    pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
    pop = algo.evolve(pop) 
   
    res = pop.champion_x   
    return res

# running above optimization code for 1000 random initializations

for i in range(1000):
    init_val = np.array([random.uniform(0, 1) for k in range(node_len)])
    
    if perturb:
        for k,v in perturb.items(): init_val[k] = v  #setting perturbations in initial values
    
    res = dyn_optimGMO(matL ,node_len ,init_val) # this function is defined here only
    
    inits.append(init_val); results.append(res)

编辑1:

正如@Ananda 下面所建议的,我对目标函数进行了更改,将运行时间减少了近 7 倍。我已经重写了代码以1000 iterations使用 pythonmultiprocessing模块运行此代码。下面是我尝试生成进程以并行处理迭代的新代码。由于我的系统只有 8 个线程,所以我将池大小限制为 5,因为 PyGMO 也使用内部并行化并且它也需要一些线程

import numpy as np
import pygmo as pg


matL = np.random.rand(300,300) ; node_len = 300

perturb = {12:1} # assign your perturb ID here

def optimizationFN(var):

    results = []
    
    inits = var[0]; perturb = var[1]

    
    class my_constrained_udp:
        
        def fitness(self, x):
            obj1 = x[None,:] @ matL @ x[:,None] # @ is mat multiplication operator
            ce1 = np.sum(inits) - np.sum(x)                   
            return [obj1, ce1]
       
        def n_objs(self): # no of objectives
            return 1
        
        def get_nec(self): #no of equality constraints
            return 1    
        
        def get_nic(self): #no of in-equality constraints
            return 0                    
        
        def get_bounds(self): #lower and upper bounds: use this to perturb the node
            lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
            if perturb:
                for k,v in perturb.items():
                    lowerB[k] = v; upperB[k] = v
            return (lowerB,upperB)
        
        def gradient(self, x):
            return pg.estimate_gradient_h(lambda x: self.fitness(x), x)
    
    def dyn_optimGMO(matL, node_len ,inits):
        '''
        perturb should be a dict of node index and value as 0 or 1. Type(node_index) = int
        '''  
        if perturb:
            for k,v in perturb.items(): inits[k] = v  #setting perturbations in initial values
            
        inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
        
        algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
        
        #algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
        
        pop = pg.population(prob = my_constrained_udp(), size = 100 , seed=123)
        
        pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
        pop = algo.evolve(pop) 
       
        res = pop.champion_x   
        return res
    
    
    if perturb:
        for k,v in perturb.items(): inits[k] = v  #setting perturbations in initial values
    
    res = dyn_optimGMO(matL ,node_len ,inits) # this function is defined here only
    
    results.append(res)
    
    return results

import time

st = time.time()
    
#1000 random initialisations
initial_vals = []
for i in range(1000): initial_vals.append(np.array([random.uniform(0, 1) for k in range(node_len)]))
initial_vals = np.array(initial_vals)

inp_val = []
for i in range(len(initial_vals)): inp_val.append([initial_vals[i],perturb])

#running evaluations
#eqVal = optimizationFN(initial_vals,perturb=perturb)
from multiprocessing import Pool


myPool = Pool(8)

data = myPool.map(optimizationFN,inp_val)

myPool.close(); myPool.join()


print('Total Time: ',round(time.time()-st,4))

这将在1.13 hours.

不过,还有其他可能性可以让我更快吗?

标签: pythonnumpyoptimizationparallel-processingpython-multiprocessing

解决方案


在尝试并行化等之前,请尝试弄清楚性能的瓶颈到底是什么并尝试解决它。

如果您使用line profiler分析您的健身功能,

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    28                                               @profile
    29                                               def fitness(self, x):
    30     96105    3577978.0     37.2      9.5          matA = np.matrix(x)
    31     96105    8548353.0     88.9     22.7          obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
    32     96105   25328835.0    263.6     67.4          ce1 = sum(init_val) - sum(x)
    33     96105     121800.0      1.3      0.3          return [obj1, ce1]

正如你所看到的,大部分时间都花在了dotandsum函数上,而大量时间也花在了创建matA.

我会像这样重写函数 -

def fitness(self, x):

    obj1 = x[None, :] @ matL @ x[:, None]
    ce1 = np.sum(init_val) - np.sum(x)

    return [obj1, ce1]

如果你分析这个函数,你可以看到,

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                               @profile
    21                                               def fitness(self, x):
    22                                           
    23     77084    3151649.0     40.9     48.9          obj1 = x[None, :] @ matL @ x[:, None]
    24     77084    3214012.0     41.7     49.9          ce1 = np.sum(init_val) - np.sum(x)
    25                                           
    26     77084      79439.0      1.0      1.2          return [obj1, ce1]

完整功能的每次点击总时间从大约 380 下降到 80。

np.matrix建议不要再使用该方法,并将被弃用。而使用原生 pythonsum代替np.sum会大大降低性能。

在我的机器上,它将性能从 33 秒/it 提高到 6 秒/迭代。性能提升约 5 倍。


推荐阅读