首页 > 解决方案 > 如何根据分布中的峰值数量创建要优化的扩展函数?

问题描述

我想在一组单独的高斯分布中分解一个分布。例如:

 f = np.array([ 3.00007453,  3.00080131,  3.00670925,  3.04374984,  3.22218031,
        3.87874542,  5.70679883,  9.49405098, 15.13899976, 20.70462534,
       23.27772517, 21.74836598, 18.51405863, 17.61004214, 20.87567502,
       25.97834859, 28.41103287, 25.85310026, 20.47067556, 16.63620493,
       16.69447783, 19.10087608, 20.27772522, 18.05713464, 13.31940778,
        8.52009358,  5.30079298,  3.74693462,  3.18885332,  3.03718737,
        3.00570287,  3.00068111,  3.00006335,  3.00000459,  3.00000026,
        3.00000001,  3.        ,  3.        ,  3.        ,  3.        ,
        3.        ,  3.        ,  3.        ,  3.        ,  3.        ,
        3.        ,  3.        ,  3.        ,  3.        ])

plt.plot(f)

在此处输入图像描述

分布可以包含许多峰。我编写了一个代码来评估峰值的数量并优化到解释返回所有高斯曲线参数的峰值所需的高斯数量。

from scipy.optimize import curve_fit
    from scipy.signal import find_peaks


def Gauss_1(xdata, H,  A, mean, std):
    return H + A*np.exp(-(xdata-mean)**2/(2*std**2))

def Gauss_2(xdata, H1, A1, mean1, std1, A2, mean2, std2, ):
    return H1 + A1*np.exp(-(xdata-mean1)**2/(2*std1**2)) + A2*np.exp(-(xdata-mean2)**2/(2*std2**2))

def Gauss_3(xdata, H1, A1, mean1, std1, A2, mean2, std2,  A3, mean3, std3 ):
    return H1 + A1*np.exp(-(xdata-mean1)**2/(2*std1**2)) + A2*np.exp(-(xdata-mean2)**2/(2*std2**2)) + A3*np.exp(-(xdata-mean3)**2/(2*std3**2))

def Gauss_4(xdata, H1, A1, mean1, std1, A2, mean2, std2, A3, mean3, std3,  A4, mean4, std4):
    return H1 + A1*np.exp(-(xdata-mean1)**2/(2*std1**2)) + A2*np.exp(-(xdata-mean2)**2/(2*std2**2)) + A3*np.exp(-(xdata-mean3)**2/(2*std3**2)) + A4*np.exp(-(xdata-mean4)**2/(2*std4**2))

def Gauss_5(xdata, H1, A1, mean1, std1,  A2, mean2, std2,  A3, mean3, std3,  A4, mean4, std4, A5, mean5, std5 ):
    return H1 + A1*np.exp(-(xdata-mean1)**2/(2*std1**2))  + A2*np.exp(-(xdata-mean2)**2/(2*std2**2)) + A3*np.exp(-(xdata-mean3)**2/(2*std3**2)) + A4*np.exp(-(xdata-mean4)**2/(2*std4**2)) + A5*np.exp(-(xdata-mean5)**2/(2*std5**2))


def get_popt(f):
    means = find_peaks(f, prominence=0.05, width=1)[0]
    if len(means) == 0:
        return []
    elif len(means) == 1:
        popt, pcov = curve_fit(Gauss_1, np.arange(len(f)), f, p0=[0, f[means[0]], means[0], 3, 
                                                                    f[means[1]], means[1], 3], 
                              bounds=([0, 0.9*f[means[0]], 0.9*means[0], 0],
                                      [1, 1.1*f[means[0]], 1.1*means[0], 10]))
    elif len(means) == 2:
        popt, pcov = curve_fit(Gauss_2, np.arange(len(f)), f, p0=[0, f[means[0]], means[0], 3, 
                                                                 f[means[1]], means[1], 3], 
                              bounds=([0, 0.9*f[means[0]], 0.9*means[0], 0, 0.9*f[means[1]], 0.9*means[1], 0],
                                      [1, 1.1*f[means[0]], 1.1*means[0], 10, 1.1*f[means[1]], 1.1*means[1], 10]))
    elif len(means) == 3:
        popt, pcov = curve_fit(Gauss_3, np.arange(len(f)), f, p0=[0, f[means[0]], means[0], 3, 
                                                                    f[means[1]], means[1], 3,
                                                                    f[means[2]], means[2], 3])
                
    elif len(means) == 4:
        popt, pcov = curve_fit(Gauss_4, np.arange(len(f)), f, p0=[0, f[means[0]], means[0], 3, 
                                                                   f[means[1]], means[1], 3,
                                                                   f[means[2]], means[2], 3,
                                                                   f[means[3]], means[3], 3],
                                bounds=([0, 0.9*f[means[0]], 0.9*means[0], 0, 0.9*f[means[1]], 0.9*means[1], 0, 0.9*f[means[2]], 0.9*means[2], 0],
                                      [1, 1.1*f[means[0]], 1.1*means[0], 10, 1.1*f[means[1]], 1.1*means[1], 10, 1.1*f[means[2]], 1.1*means[2], 10]))
    else:
        popt, pcov = curve_fit(Gauss_5, np.arange(len(f)), f, p0=[0, f[means[0]], means[0], 3, 
                                                                   f[means[1]], means[1], 3,
                                                                   f[means[2]], means[2], 3, 
                                                                    f[means[3]], means[3], 3, 
                                                                   f[means[3]], means[3], 3])
    return popt

有没有办法组装高斯函数的数量和参数,使其仅由峰值数量定义,而没有所有这些 if / else 比较和单个函数定义?

标签: pythonoptimizationgaussian

解决方案


您正在寻找已经完成的高斯混合的实现。因此,理想情况下,使用而不是您的实现。scikit-learn

但是,要回答您的问题,您只需要一个矢量化版本,Gauss_X可以通过使用 numpy 矢量化轻松实现。据我了解,它是给定高斯混合的似然函数。

import numpy as np
def Gauss_N(xdata, means, stds, mixture_weights):
    nominator = (np.tile(xdata, (len(means), 1)) - means.reshape(-1, 1))**2
    denom = np.tile((2*stds**2), (len(xdata), 1)).T
    densities = np.exp(nominator/denom)  # note, you need to also add a normalizing constant!
    likelihoods = np.sum(densities, axis=1)
    likelihood = np.sum(likelihoods*mixture_weights)
    return likelihood

xdata = np.array([4, 5, 6])
means = np.array([1, 2])
stds = np.array([1, 2])
mixture_weights = np.array([0.3, 0.5])
Gauss_N(xdata, means, stds, mixture_weights)  # simply works

现在您应该可以popt, pcov = curve_fit(Gauss_N, ...)在每个if-else子句中进行操作了。


这可能无法开箱即用,您的示例在许多方面都非常复杂,但应该让您了解如何完成。


推荐阅读