python - SGD 实现 Python
问题描述
我知道之前曾在 SO 上询问过 SGD,但我想对我的代码发表意见,如下所示:
import numpy as np
import matplotlib.pyplot as plt
# Generating data
m,n = 10000,4
x = np.random.normal(loc=0,scale=1,size=(m,4))
theta_0 = 2
theta = np.append([],[1,0.5,0.25,0.125]).reshape(n,1)
y = np.matmul(x,theta) + theta_0*np.ones(m).reshape((m,1)) + np.random.normal(loc=0,scale=0.25,size=(m,1))
# input features
x0 = np.ones([m,1])
X = np.append(x0,x,axis=1)
# defining the cost function
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2
# initializations
theta_GD = np.append([theta_0],[theta]).reshape(n+1,1)
alp = 1e-5
num_iterations = 10000
# Batch Sum
def batch(i,j,theta_GD):
batch_sum = 0
for k in range(i,i+9):
batch_sum += float((y[k]-np.transpose(theta_GD).dot(X[k]))*X[k][j])
return batch_sum
# Gradient Step
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10
theta_updated = theta_current
return theta_updated
# gradient descent
cost_vec = []
for i in range(num_iterations):
cost_vec.append(compute_cost(X[i], y[i], theta_GD))
theta_GD = gradient_step(theta_GD, X, y, alp,i)
plt.plot(cost_vec)
plt.xlabel('iterations')
plt.ylabel('cost')
我正在尝试批量大小为 10 的小批量 GD。我的 MSE 出现了极其振荡的行为。问题出在哪里?谢谢。
PS我正在关注NG的https://www.coursera.org/learn/machine-learning/lecture/9zJUs/mini-batch-gradient-descent
解决方案
这是对基本数学原理的描述,而不是基于代码的解决方案......
成本函数是高度非线性np.power()
的(在数学中,这受混沌理论/非线性动力系统理论的影响(https://pdfs.semanticscholar.org/8e0d/ee3c433b1806bfa0d98286836096f8c2681d.pdf),参见物流地图
(https://en.wikipedia.org/wiki/Logistic_map)。如果生长因子 r超过阈值,逻辑图就会振荡。生长因子是衡量系统中有多少能量的指标。
在您的代码中,关键部分是成本函数、成本向量,即系统的历史和时间步长:
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2
cost_vec = []
for i in range(num_iterations):
cost_vec.append(compute_cost(X[i], y[i], theta_GD))
theta_GD = gradient_step(theta_GD, X, y, alp,i)
# Gradient Step
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10
theta_updated = theta_current
return theta_updated
如果您将此与逻辑图的实现进行比较,您会看到相似之处
from pylab import show, scatter, xlim, ylim
from random import randint
iter = 1000 # Number of iterations per point
seed = 0.5 # Seed value for x in (0, 1)
spacing = .0001 # Spacing between points on domain (r-axis)
res = 8 # Largest n-cycle visible
# Initialize r and x lists
rlist = []
xlist = []
def logisticmap(x, r): <------------------ nonlinear function
return x * r * (1 - x)
# Return nth iteration of logisticmap(x. r)
def iterate(n, x, r):
for i in range(1,n):
x = logisticmap(x, r)
return x
# Generate list values -- iterate for each value of r
for r in [i * spacing for i in range(int(1/spacing),int(4/spacing))]:
rlist.append(r)
xlist.append(iterate(randint(iter-res/2,iter+res/2), seed, r)) <--------- similar to cost_vector, the history of the system
scatter(rlist, xlist, s = .01)
xlim(0.9, 4.1)
ylim(-0.1,1.1)
show()
在此基础上,您可以尝试通过在逻辑图中引入类似于增长因子的因子来修改成本函数,以降低系统的振荡强度
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10 <--- introduce a factor somewhere to keep the system under the oscillation threshold
theta_updated = theta_current
return theta_updated
或者
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2 <--- introduce a factor somewhere to keep the system under the oscillation threshold
如果这不起作用,请遵循https://www.reddit.com/r/MachineLearning/comments/3y9gkj/how_can_i_avoid_oscillations_in_gradient_descent/中的建议(时间步长,...)
推荐阅读
- embedded - 如何在 uCOS II 和 TM4C123G (ARM M4) MCU 中实现硬件中断?
- python - Python/PySide:如何制作一个保持在主窗口顶部但不覆盖其他小部件的小部件?
- wordpress - 从 WordPress 中的导航项触发下拉菜单
- angular - TestCafe 不使用 Angular asyncPipe 渲染元素
- wso2 - 如何将基本密码用户从 Gigya 迁移到 WSO2 IS
- xamarin - NavigationBar.ShadowImage = new UIImage() 在 xamarin.forms 4.5 之后不删除阴影线
- variables - 无法在 php 7.4 Laravel 7 中使用公共类型属性
- javascript - 为什么这段代码在 jQuery 上不能正常工作
- java - 在使用带有“AES/CBC/PKCS5Padding”的 Java Cipher 进行解密时,是否必须指定 IV?并且只能使用 SecretKeyFactory 吗?
- java - 随机 javax.net.ssl.SSLHandshakeException:GCP 上的“握手期间远程主机关闭连接”