gradient-descent - 反向传播的梯度下降部分
问题描述
我正在尝试编写一个两层神经网络简单的 NN,正如我在这里所描述的那样https://itisexplained.com/html/NN/ml/5_codingneuralnetwork/
在通过反向传播计算外层和内层的梯度后,我陷入了更新权重的最后一步
#---------------------------------------------------------------
# Two layered NW. Using from (1) and the equations we derived as explanations
# (1) http://iamtrask.github.io/2015/07/12/basic-python-network/
#---------------------------------------------------------------
import numpy as np
# seed random numbers to make calculation deterministic
np.random.seed(1)
# pretty print numpy array
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
# let us code our sigmoid funciton
def sigmoid(x):
return 1/(1+np.exp(-x))
# let us add a method that takes the derivative of x as well
def derv_sigmoid(x):
return x*(1-x)
# set learning rate as 1 for this toy example
learningRate = 1
# input x, also used as the training set here
x = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
# desired output for each of the training set above
y = np.array([[0,1,1,0]]).T
# Explanaiton - as long as input has two ones, but not three, ouput is One
"""
Input [0,0,1] Output = 0
Input [0,1,1] Output = 1
Input [1,0,1] Output = 1
Input [1,1,1] Output = 0
"""
input_rows = 4
# Randomly initalised weights
weight1 = np.random.random((3,input_rows))
weight2 = np.random.random((input_rows,1))
print("Shape weight1",np.shape(weight1)) #debug
print("Shape weight2",np.shape(weight2)) #debug
# Activation to layer 0 is taken as input x
a0 = x
iterations = 1000
for iter in range(0,iterations):
# Forward pass - Straight Forward
z1= x @ weight1
a1 = sigmoid(z1)
z2= a1 @ weight2
a2 = sigmoid(z2)
# Backward Pass - Backpropagation
delta2 = (y-a2)
#---------------------------------------------------------------
# Calcluating change of Cost/Loss wrto weight of 2nd/last layer
# Eq (A) ---> dC_dw2 = delta2*derv_sigmoid(z2)
#---------------------------------------------------------------
dC_dw2 = delta2 * derv_sigmoid(a2)
if iter == 0:
print("Shape dC_dw2",np.shape(dC_dw2)) #debug
#---------------------------------------------------------------
# Calcluating change of Cost/Loss wrto weight of 2nd/last layer
# Eq (B)---> dC_dw1 = derv_sigmoid(a1)*delta2*derv_sigmoid(a2)*weight2
# note delta2*derv_sigmoid(a2) == dC_dw2
# dC_dw1 = derv_sigmoid(a1)*dC_dw2*weight2
#---------------------------------------------------------------
dC_dw1 = (np.multiply(dC_dw2,weight2.T)) * derv_sigmoid(a1)
if iter == 0:
print("Shape dC_dw1",np.shape(dC_dw1)) #debug
#---------------------------------------------------------------
#Gradinent descent
#---------------------------------------------------------------
#weight2 = weight2 - learningRate*dC_dw2 --> these are what the textbook tells
#weight1 = weight1 - learningRate*dC_dw1
weight2 = weight2 + learningRate*np.dot(a1.T,dC_dw2) # this is what works
weight1 = weight1 + learningRate*np.dot(a0.T,dC_dw1)
print("New ouput\n",a2)
为什么是
weight2 = weight2 + learningRate*np.dot(a1.T,dC_dw2)
weight1 = weight1 + learningRate*np.dot(a0.T,dC_dw1)
完成而不是
#weight2 = weight2 - learningRate*dC_dw2
#weight1 = weight1 - learningRate*dC_dw1
我没有通过乘以前一层的激活或相同的直觉来获得更新权重的方程的来源。我希望代码简单且不言自明。
在这里问它,因为它涉及一些代码;也在这里https://ai.stackexchange.com/questions/26920/updating-the-weights-in-back-propagation
解决方案
推荐阅读
- graphql - 错误:`@scalarList` 的策略参数的有效值为:RELATION
- xpath - xQuery 按类别分组输出
- c# - 强制 MVC CookieAuthentication 提前超时,然后设置滑动和绝对过期的组合
- php - 在面积图中设置动态图形
- d3.js - 将条形图的标签文本旋转 90 度
- python - 在 Python 中拆分和提取后字符串行丢失
- hibernate - 如何从多对多关系实体中删除对象
- laravel - 你如何将 Vue.js 包(如 vodal)集成到 Laravel 中?
- c# - 如何在服务器端显示图像
- python - 请帮我处理浮点数和整数