首页 > 解决方案 > 使用回归模型找到固定输出的最小可能输入组合

问题描述

问题:

两个变量(x1, x2)和两个输出(y1, y2)的数据可用于计算输入和输出之间的关系。

x1最后我想知道给我一个具体y1y2价值的最低限度。

到目前为止我考虑过的方法:

为了曲线拟合x1,x2y1,y2使用线性回归模型之间的关系,我只对示例代码进行了少量修改。x1和之间的关系x2被扩展为适合三次函数,因为这给了我一个低均方误差。

因此,输入数据拟合的方程为

y1 = a0 + a1*x1 + a2*x2 + a3*x1^2 + a4*x1*x2 + a5*x2^2 + a6 *x1^3 + a7*x1^2*x2 + a8*x1*x2^2 + a9*x2^3

y2 = b0 * intercept + b1*x1 + b2*x2 + b3*x1^2 + b4*x1*x2 + b5*x2^2 + b6*x1^3 + b7*x1^2*x2 + b8*x1*x2^2 + b9*x2^3

很抱歉它们可能难以阅读,但它不允许我将它们作为图像发布(底部的 LaTeX 代码)。

其中(a, b)方程的系数已经通过线性回归模型coeffs1coeffs2下面的代码求解。

from numpy import array, hstack, math
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.preprocessing import PolynomialFeatures
import sympy as sym

def create_data(n):

    # Input data
    x1 = array([0,0,0,0,0,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,60,60,60,60,60]).reshape(n, 1)
    x2 = array([100,200,300,400,500,100,200,300,400,500,100,200,300,400,500,100,200,300,400,500,100,200,300,400,500,100,200,300,400,500,100,200,300,400,500]).reshape(n, 1)
    # Corresponding outputs
    y1 = array([350.7214942,700.9404275,1049.29659,1392.818473,1727.293514,345.418542,690.4277426,1033.665635,1372.435114,1704.064163,329.6311055,658.9473636,986.6760855,1310.013523,1627.176216,303.8203576,607.4535149,909.770934,1207.960648,1499.905656,268.7786346,537.4766415,805.2109169,1069.428571,1327.209942,225.5578289,451.1324693,676.1020492,898.3732076,1114.408884,175.4807202,351.02104,526.3101109,699.6920184,867.7300064]).reshape(n, 1)
    y2 = array([12.06118197, 13.2332737,14.93878735,16.94583691,19.09960095,11.52121175,12.23713054,13.62566473,15.44234451,17.5104543,10.97161994,11.18544616,12.21149801,13.81244743,15.77033639,10.42959162,10.09739159,10.709275,12.07256479,13.9161106,9.913894093,8.999435475,9.13551907,10.24129286,11.98680831,9.445995362,7.928597333,7.508774345,8.325111871,10.02131756,9.048387975,6.938567678,5.861061317,6.345192659,8.054102881]).reshape(n, 1)

    # Combine and form inputs into a third order polynomial
    X = hstack((x1, x2))
    poly = PolynomialFeatures(degree=3)
    X = poly.fit_transform(X)
    Y = hstack((y1, y2))
    return X, Y

n = 35
X, Y = create_data(n)

xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.5)
print("xtrain:", xtrain.shape, "ytrain:", ytrain.shape)
print("xtest:", xtest.shape, "ytest:", ytest.shape)

lr = LinearRegression(fit_intercept=True)
model = MultiOutputRegressor(estimator=lr)

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)

coeffs1= model.estimators_[0].coef_
coeffs2= model.estimators_[1].coef_
intercept=model.estimators_[1].intercept_

ypred = model.predict(xtest)

print("y1 MSE:%.4f" % mean_squared_error(ytest[:, 0], ypred[:, 0]))
print("y2 MSE:%.4f" % mean_squared_error(ytest[:, 1], ypred[:, 1]))

对于给定y1y2已知的系数,我显然留下了两个联立方程,但我不确定解决这些以获得最佳输入的最佳方法(sympy?scipy optimise?),特别是考虑到我想要 x1,输入之一是最小值而不是输出。

此问题的实际固定输出将是y1 = 500y2 = 9

方程的 LaTeX 代码

y_{2} = b_{0} + b_{1}x_{1} + b_{2}x_{2} + b_{3}x_{1}^{2} + b_{4}x_{1}x_{2} + b_{5}x_{2}^{2} + b_{6}x_{1}^{3} + b_{7}x_{1}^{2}x_{2} + b_{8}x_{1}x_{2}^{2} + b_{9}x_{2}^{3}

标签: pythonscikit-learnscipylinear-regressioncurve-fitting

解决方案


推荐阅读