首页 > 解决方案 > 猜测将 Voigt 曲线拟合到数据 - Python 脚本行为不规律

问题描述

让我描述一下我正在尝试做的事情。这需要比我更了解 Python 的人的眼睛。

我有一组数据(实际上是沉积物直径与样品中的百分比),绘制时它显示了一个独特的光谱。我假设数据中隐藏着“模式”,并试图强制拟合 voigt、高斯或洛伦兹曲线,以便提取一些信息。这个脚本的框架来自一个在 XRD 数据上做类似事情的人。我还不够熟练,无法真正理解脚本是如何实现目标的,所以我在隔离一些奇怪的行为时遇到了麻烦。让我先概述一下奇怪之处,然后我将分享代码。

  1. 如果我用相同的数据一遍又一遍地运行代码,结果并不总是相同的。不仅如此,而且在 25% 的情况下,我会收到一个我无法弄清楚的错误。为什么会发生此错误,为什么仅在某些时候发生?

TypeError: 不支持的操作数类型 -: 'tuple' 和 'float'

  1. 当我在代码开头定义“规范”时,我必须指定模型类型。一次偶然的机会,我首先尝试了 VoigtModel,而且大部分时间它都能正常工作。但是,如果我将类型指定为 Gaussian 或 Lorentzian,则脚本根本不会运行:

TypeError:不能将序列乘以“float”类型的非整数

  1. 在脚本中,我要求它打印一些关于它适合的曲线的信息。具体来说,曲线峰值的 x、y 值。但是,当我随后运行它时,它可能适合不同的曲线,但 print() 输出不会改变。像什么?

如果有人可以试一试该代码,或许可以提供一些关于该代码有什么问题的见解,我将不胜感激。

编辑我发现如果我添加更多 {'type': 'VoigtModel'} 到 spec = ,脚本失败的频率会降低。如果我删除一些(留下一两个),那么它会以更大的百分比失败。仍然可以使用一些帮助来理解连接。

编码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
import random

from lmfit import models

x = 0, 0.09326263, 0.186541806, 0.279826296, 0.373096863, 0.466372043, 0.559644359, 0.652910952, 0.746190193, 0.839463682, 0.932734784, 1.026014714, 1.119288717, 1.212558343, 1.305836463, 1.399111865, 1.492381488, 1.585657384, 1.678931325, 1.772207061, 1.865478378, 1.958752334, 2.05202538, 2.145299504, 2.238574433, 2.331847735, 2.425123471, 2.518395825, 2.611671451, 2.704945386, 2.798218396, 2.891491964, 2.984766114, 3.078040106, 3.171314505, 3.264585057, 3.357863555, 3.451137678, 3.544409886, 3.637684839, 3.730956661, 3.824229504, 3.917507936, 4.010781777, 4.104055591, 4.197326, 4.290603266, 4.383874926, 4.477149297, 4.57042345, 4.663698494, 4.756972396, 4.850245469, 4.943519232, 5.036793499, 5.13006734, 5.223340556, 5.316615186, 5.409888929, 5.503163537, 5.596438512, 5.689708905, 5.782986369, 5.876257098, 5.969532028, 6.062807987, 6.156078156, 6.249352461, 6.342627453, 6.43590194, 6.529177933, 6.622450379, 6.715725752, 6.808997914, 6.902272777, 6.995546352, 7.088819796, 7.18209372, 7.275367937, 7.36864248, 7.461916216, 7.555189618, 7.648464489, 7.741737739, 7.835015624, 7.928288902, 8.021559911, 8.114833257, 8.208110415, 8.301378965, 8.394658258, 8.487929146, 8.581205011, 8.674478952, 8.767749555, 8.861024001, 8.954299075, 9.047574353, 9.140848269, 9.234120373, 9.327394253, 9.420668151, 9.513942544, 9.607217038, 9.700491238, 9.793764758, 9.887039268, 9.980313168, 10.0735868, 10.16686092, 10.26013875, 10.35340805, 10.44668356, 10.53995856, 10.63323182, 10.72650553
y = 0.001352, 0.001721, 0.002661, 0.00523, 0.010879, 0.020142, 0.030427, 0.039188, 0.046922, 0.055438, 0.065352, 0.076432, 0.089913, 0.107888, 0.132296, 0.164797, 0.208043, 0.266067, 0.343688, 0.443698, 0.565158, 0.704086, 0.854979, 1.01437, 1.17932, 1.34739, 1.51366, 1.67215, 1.81638, 1.94147, 2.0432, 2.11934, 2.16792, 2.19005, 2.18907, 2.17172, 2.14565, 2.11866, 2.09749, 2.08736, 2.09102, 2.1084, 2.13739, 2.17478, 2.21729, 2.26139, 2.30342, 2.33966, 2.36671, 2.38045, 2.37413, 2.33769, 2.26088, 2.13908, 1.9769, 1.78619, 1.57832, 1.35944, 1.13483, 0.919488, 0.743312, 0.637312, 0.615423, 0.665356, 0.744581, 0.78791, 0.743882, 0.617121, 0.46602, 0.356204, 0.320677, 0.361725, 0.45788, 0.566712, 0.650727, 0.701846, 0.739237, 0.788714, 0.863346, 0.956347, 1.04314, 1.09353, 1.0874, 1.02493, 0.925497, 0.815472, 0.721377, 0.658056, 0.628985, 0.623906, 0.617012, 0.578717, 0.487132, 0.346259, 0.185964, 0.066494, 0.011942, 0.000815, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#xlog = [math.log(xval) for xval in x]

spec = {
    'x': x,
    'y': y,
    'model': [
        {'type': 'VoigtModel'},
        {'type': 'VoigtModel'},
        {'type': 'VoigtModel'},
        {'type': 'VoigtModel'},
    ]}

plt.plot(spec['x'], spec['y'])
plt.show()


def update_spec_from_peaks(spec, model_indicies, peak_widths=(1, 50), **kwargs):
    x = spec['x']
    y = spec['y']
    x_range = np.max(x) - np.min(x)
    peak_indicies = signal.find_peaks_cwt(y, peak_widths)
    np.random.shuffle(peak_indicies)
    for peak_indicie, model_indicie in zip(peak_indicies.tolist(), model_indicies):
        model = spec['model'][model_indicie]
        if model['type'] in ['GaussianModel', 'LorentzianModel', 'VoigtModel']:
            params = {
                'height': y[peak_indicie],
                'sigma': x_range / len(x) * np.min(peak_widths),
                'center': x[peak_indicie]
            }
            if 'params' in model:
                model.update(params)
            else:
                model['params'] = params
    return peak_indicies

# 
peaks_found = update_spec_from_peaks(spec, [0], peak_widths=(5,))    
print(peaks_found)

for i in peaks_found:
 print(x[i], y[i])


def generate_model(spec):
    composite_model = None
    params = None
    x = spec['x']
    y = spec['y']
    x_min = np.min(x)
    x_max = np.max(x)
    x_range = x_max - x_min
    y_max = np.max(y)
    for i, basis_func in enumerate(spec['model']):
        prefix = f'm{i}_'
        model = getattr(models, basis_func['type'])(prefix=prefix)
        if basis_func['type'] in ['GaussianModel', 'LorentzianModel', 'VoigtModel']: # for now VoigtModel has gamma constrained to sigma
            model.set_param_hint('sigma', min=1e-6, max=x_range)
            model.set_param_hint('center', min=x_min, max=x_max)
            model.set_param_hint('height', min=1e-6, max=1.1*y_max)
            model.set_param_hint('amplitude', min=1e-6)
            # default guess is horrible!! do not use guess()
            default_params = {
                prefix+'center': x_min + x_range * random.random(),
                prefix+'height': y_max * random.random(),
                prefix+'sigma': x_range * random.random()
            }
        else:
            raise NotImplemented(f'model {basis_func["type"]} not implemented yet')
        if 'help' in basis_func:  # allow override of settings in parameter
            for param, options in basis_func['help'].items():
                model.set_param_hint(param, **options)
        model_params = model.make_params(**default_params, **basis_func.get('params', {}))
        if params is None:
            params = model_params
        else:
            params.update(model_params)
        if composite_model is None:
            composite_model = model
        else:
            composite_model = composite_model + model
    return composite_model, params


model, params = generate_model(spec)
output = model.fit(spec['y'], params, x=spec['x'])
fig, ax = plt.subplots()
ax.scatter(spec['x'], spec['y'], s=4)
components = output.eval_components(x=spec['x'])
print(len(spec['model']))
for i, model in enumerate(spec['model']):
    ax.plot(spec['x'], components[f'm{i}_'])```

标签: pythonnumpyscipylmfit

解决方案


很明显,当给定相同的输入时,任何代码每次都将完全相同地运行。

拟合似乎表现不稳定,因为您故意提供不稳定的数据。实际上,您是在告诉它随机化初始起始值。而且您正在以编程方式设置界限,而不是检查初始值与界限的接近程度。所以,问问自己,你为什么要做这些事情?

你的代码看起来很复杂,可能太复杂了,以至于你不理解它。从摆脱所有垃圾开始。也许制作一个高斯总和的模型,也许是这样的(代码将运行,给出合适的拟合):

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

from lmfit import models

x = np.array([0, 0.09326263, 0.186541806, 0.279826296, 0.373096863, 0.466372043, 0.559644359, 0.652910952, 0.746190193, 0.839463682, 0.932734784, 1.026014714, 1.119288717, 1.212558343, 1.305836463, 1.399111865, 1.492381488, 1.585657384, 1.678931325, 1.772207061, 1.865478378, 1.958752334, 2.05202538, 2.145299504, 2.238574433, 2.331847735, 2.425123471, 2.518395825, 2.611671451, 2.704945386, 2.798218396, 2.891491964, 2.984766114, 3.078040106, 3.171314505, 3.264585057, 3.357863555, 3.451137678, 3.544409886, 3.637684839, 3.730956661, 3.824229504, 3.917507936, 4.010781777, 4.104055591, 4.197326, 4.290603266, 4.383874926, 4.477149297, 4.57042345, 4.663698494, 4.756972396, 4.850245469, 4.943519232, 5.036793499, 5.13006734, 5.223340556, 5.316615186, 5.409888929, 5.503163537, 5.596438512, 5.689708905, 5.782986369, 5.876257098, 5.969532028, 6.062807987, 6.156078156, 6.249352461, 6.342627453, 6.43590194, 6.529177933, 6.622450379, 6.715725752, 6.808997914, 6.902272777, 6.995546352, 7.088819796, 7.18209372, 7.275367937, 7.36864248, 7.461916216, 7.555189618, 7.648464489, 7.741737739, 7.835015624, 7.928288902, 8.021559911, 8.114833257, 8.208110415, 8.301378965, 8.394658258, 8.487929146, 8.581205011, 8.674478952, 8.767749555, 8.861024001, 8.954299075, 9.047574353, 9.140848269, 9.234120373, 9.327394253, 9.420668151, 9.513942544, 9.607217038, 9.700491238, 9.793764758, 9.887039268, 9.980313168, 10.0735868, 10.16686092, 10.26013875, 10.35340805, 10.44668356, 10.53995856, 10.63323182, 10.72650553])

y = np.array([0.001352, 0.001721, 0.002661, 0.00523, 0.010879, 0.020142, 0.030427, 0.039188, 0.046922, 0.055438, 0.065352, 0.076432, 0.089913, 0.107888, 0.132296, 0.164797, 0.208043, 0.266067, 0.343688, 0.443698, 0.565158, 0.704086, 0.854979, 1.01437, 1.17932, 1.34739, 1.51366, 1.67215, 1.81638, 1.94147, 2.0432, 2.11934, 2.16792, 2.19005, 2.18907, 2.17172, 2.14565, 2.11866, 2.09749, 2.08736, 2.09102, 2.1084, 2.13739, 2.17478, 2.21729, 2.26139, 2.30342, 2.33966, 2.36671, 2.38045, 2.37413, 2.33769, 2.26088, 2.13908, 1.9769, 1.78619, 1.57832, 1.35944, 1.13483, 0.919488, 0.743312, 0.637312, 0.615423, 0.665356, 0.744581, 0.78791, 0.743882, 0.617121, 0.46602, 0.356204, 0.320677, 0.361725, 0.45788, 0.566712, 0.650727, 0.701846, 0.739237, 0.788714, 0.863346, 0.956347, 1.04314, 1.09353, 1.0874, 1.02493, 0.925497, 0.815472, 0.721377, 0.658056, 0.628985, 0.623906, 0.617012, 0.578717, 0.487132, 0.346259, 0.185964, 0.066494, 0.011942, 0.000815, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])


peaks = signal.find_peaks_cwt(y, (1.5, 25))             
xstep = x.ptp() / len(x)

model, params = None, None

for i, peak_index in enumerate(peaks):
    this_model = models.GaussianModel(prefix=f'p{1+i:d}_')
    this_params = this_model.make_params(amplitude=y[peak_index], center=x[peak_index], sigma=2*xstep)
    if model is None:
        model = this_model
        params = this_params
    else:
        model += this_model
        params.update(this_params)
        
result = model.fit(y, params, x=x)
print(result.fit_report())

plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.legend()
plt.show()

它需要比这复杂得多吗?嗯,也许不是。这提供了一个不错的贴合度,尽管它可能会在 x=7 左右缺少一个微妙的肩峰。

从简单开始。尽可能保持简单。只有在简化其他事情时才增加复杂性。


推荐阅读