python - 梯度下降得到不同的最终参数,用于矩阵形式的线性回归的不同初始参数
问题描述
我以矩阵形式为线性回归编写了梯度下降,但对于不同的初始 theta,我得到了不同的最终 theta。我设置了一个非常小的学习率和大的最大迭代次数,所以我认为所有最终的 theta 应该是相似的。
这是我的梯度下降算法代码
def loss(theta, X, y_obs):
'''
Calculates the loss for given X and y.
theta= Vector of weights
dim(theta) = d+1
X = Matrix of X with added bias units
dim(X) = n x (d+1)
y = Vector of y true labels
dim(y) = n
where n is number of data points and d is number of features
Returns predited labels y_hat, dim(y_hat) = n
'''
n = len(y_obs)
y_hat = np.dot(X, theta.T)
loss = np.mean((y_hat - y_obs) ** 2) #Here is MSE loss
return loss
def gradient_of_loss(theta, X, y_obs):
'''
The function should be different for different relationship between X and y_hat
This function only works for linear regression
Returns gradient of loss wrt theta dloss
dim(dloss) = d+1
'''
n = len(y_obs)
return 1.0/n * (theta.T @ X.T @ X - y_obs.T @ X)
def gradient_descent(X, y_obs, theta, learning_rate=0.01, max_iterations=100, epsilon=0.1):
'''
Returns
1. the final theta vector, dim(theta) = d+1
2. Whether it reaches max_iterations
Terminates if loss <= epsilon or reach max number of iterations
'''
n = len(y_obs)
num_iter = 0
while (loss(theta, X, y_obs) > epsilon) and (num_iter <= max_iterations):
theta = theta - learning_rate * gradient_of_loss(theta, X, y_obs)
num_iter += 1
return theta, (num_iter <= max_iterations)
这是我运行它的代码并与分析解决方案进行比较
tips = sns.load_dataset("tips")
tips["bias"] = 1
X = tips[["total_bill", "bias"]] #Shape (244, 2)
y_obs = tips["tip"] #Shape (244, 1)
theta, num_iter = gradient_descent(X, y_obs, np.asarray([0, 0]), learning_rate=0.0001, max_iterations=1000)
theta_hat = np.linalg.solve(X.T @ X, X.T @ y_obs)
theta_hat
解析解 (theta_hat) 的 Theta 是 [0.1, 0.92] 而我的算法中的 theta 是
[0.142852(total_bill) 0.021238(bias)] 对于 theta = [0, 0]
[0.103287(total_bill) 0.961570(bias)] 对于 theta = [1, 1]
[0.024156(total_bill) 2.842236(bias)] 对于 theta = [3, 3]
等等
解决方案
推荐阅读
- python - SQLAlchemy & Flask - 从多表查询中获取结果
- python - Python Multiprocessing Pool.map 在 __new__ 中导致错误
- python - 如何将 QML Drawer 与 Qt 小部件一起使用?
- docker - 如何根据浏览器关闭或标签关闭事件停止特定的 docker 实例?
- c++ - C++20 std::chrono::duration 格式的缺点
- c++ - C++ - 标准输入循环
- git - git reset 后开发人员无法推送到受保护的分支
- r - 在 Shiny 中运行 R 函数需要纠正什么
- javascript - 有人能解释一下函数中的一行代码吗
- php - 使用数据库中的值填充表