首页 > 解决方案 > 梯度下降得到不同的最终参数,用于矩阵形式的线性回归的不同初始参数

问题描述

我以矩阵形式为线性回归编写了梯度下降,但对于不同的初始 theta,我得到了不同的最终 theta。我设置了一个非常小的学习率和大的最大迭代次数,所以我认为所有最终的 theta 应该是相似的。

这是我的梯度下降算法代码

def loss(theta, X, y_obs):
    '''
    Calculates the loss for given X and y. 
    theta= Vector of weights
    dim(theta) = d+1
    
    X = Matrix of X with added bias units
    dim(X) = n x (d+1) 
    
    y = Vector of y true labels
    dim(y) = n
    
    where n is number of data points and d is number of features
    
    Returns predited labels y_hat, dim(y_hat) = n
    '''
    n = len(y_obs)
    y_hat = np.dot(X, theta.T)
    loss = np.mean((y_hat - y_obs) ** 2)   #Here is MSE loss
    return loss
    



def gradient_of_loss(theta, X, y_obs):
    '''
    The function should be different for different relationship between X and y_hat
    This function only works for linear regression
    
    Returns gradient of loss wrt theta dloss
    dim(dloss) = d+1
    '''
    n = len(y_obs)
    return 1.0/n * (theta.T @ X.T @ X - y_obs.T @ X)



def gradient_descent(X, y_obs, theta, learning_rate=0.01, max_iterations=100, epsilon=0.1):
    '''
    Returns
    1. the final theta vector, dim(theta) = d+1
    2. Whether it reaches max_iterations
    
    Terminates if loss <= epsilon or reach max number of iterations
    '''
    
    n = len(y_obs)
    num_iter = 0
    
        
    while (loss(theta, X, y_obs) > epsilon) and (num_iter <= max_iterations):
        theta = theta - learning_rate * gradient_of_loss(theta, X, y_obs)
        num_iter += 1
        
    return theta, (num_iter <= max_iterations) 

这是我运行它的代码并与分析解决方案进行比较

tips = sns.load_dataset("tips")
tips["bias"] = 1
X = tips[["total_bill", "bias"]] #Shape (244, 2)
y_obs = tips["tip"] #Shape (244, 1)
theta, num_iter = gradient_descent(X, y_obs, np.asarray([0, 0]), learning_rate=0.0001, max_iterations=1000)

theta_hat = np.linalg.solve(X.T @ X, X.T @ y_obs)
theta_hat

解析解 (theta_hat) 的 Theta 是 [0.1, 0.92] 而我的算法中的 theta 是

[0.142852(total_bill) 0.021238(bias)] 对于 theta = [0, 0]

[0.103287(total_bill) 0.961570(bias)] 对于 theta = [1, 1]

[0.024156(total_bill) 2.842236(bias)] 对于 theta = [3, 3]

等等

标签: pythonlinear-regressionmatrix-multiplicationgradient-descent

解决方案


推荐阅读