首页 > 解决方案 > 关于 tensorflow 中的反向传播深度神经网络

问题描述

我正在阅读有关反向传播深度神经网络的信息,据我所知,我可以将这种神经网络的算法总结如下:

1- Input x : 为输入层设置相应的激活

2-前馈:计算前向传播的误差

3- 输出误差:计算输出误差

4-反向传播误差:计算反向传播的误差

5- 输出:使用代价函数的梯度

没关系,然后我检查了许多这种类型的深度网络的代码,下面是一个带有解释的示例代码:

### imports
import tensorflow as tf

### constant data
x  = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_ = [[0.],[0.],[1.],[1.]]

### induction
# 1x2 input -> 2x3 hidden sigmoid -> 3x1 sigmoid output

# Layer 0 = the x2 inputs
x0 = tf.constant( x  , dtype=tf.float32 )
y0 = tf.constant( y_ , dtype=tf.float32 )

# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable( tf.random_uniform( [2,3] , minval=0.1 , maxval=0.9 , dtype=tf.float32  ))
b1 = tf.Variable( tf.random_uniform( [3]   , minval=0.1 , maxval=0.9 , dtype=tf.float32  ))
h1 = tf.sigmoid( tf.matmul( x0,m1 ) + b1 )

# Layer 2 = the 3x1 sigmoid output
m2 = tf.Variable( tf.random_uniform( [3,1] , minval=0.1 , maxval=0.9 , dtype=tf.float32  ))
b2 = tf.Variable( tf.random_uniform( [1]   , minval=0.1 , maxval=0.9 , dtype=tf.float32  ))
y_out = tf.sigmoid( tf.matmul( h1,m2 ) + b2 )


### loss
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum( tf.square( y0 - y_out ) )

# training step : gradient decent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)


### training
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
  sess.run( tf.global_variables_initializer() )
  for step in range(500) :
    sess.run(train)

  results = sess.run([m1,b1,m2,b2,y_out,loss])
  labels  = "m1,b1,m2,b2,y_out,loss".split(",")
  for label,result in zip(*(labels,results)) :
    print ""
    print label
    print result

print ""

我的问题,上面的代码正在计算前向传播的误差,但我没有看到任何计算反向传播误差的步骤。换句话说,按照上面的描述,我可以看到步骤 1(输入 x)、2(前馈)、3(输出错误)和 5(输出),但步骤号 4 是(反向传播错误)在代码中!!这是正确的还是代码中缺少什么?我在网上找到的所有代码都遵循反向传播深度神经网络中的相同步骤的问题!请问,您能否描述反向传播错误的步骤是如何发生代码的,或者我应该添加什么来执行该步骤?

谢谢

标签: pythontensorflowdeep-learning

解决方案


In simple terms, when you build the TF graph up to the point you are computing the loss in your code, TF will know on which tf.Variable (weights) the loss depends. Then, when you create the node train = tf.train.GradientDescentOptimizer(1.0).minimize(loss), and later run it in a tf.Session, the backpropagation is done for you in the background. To be more specific, the train = tf.train.GradientDescentOptimizer(1.0).minimize(loss) merges the following steps:

# 1. Create a GD optimizer with a learning rate of 1.0
optimizer = tf.train.GradientDescentOptimizer(1.0)
# 2. Compute the gradients for each of the variables (weights) with respect to the loss
gradients, variables = zip(*optimizer.compute_gradients(loss))
# 3. Update the variables (weights) based on the computed gradients
train = optimizer.apply_gradients(zip(gradients, variables))

In particular, step 1 and 2, summarize the backpropagation step. Hope that this makes things more clear for you!


Besides, I want to restructure the steps in your question:

  1. Input X: The input of the neural network.
  2. Forward pass: Propagating the input through the neural network, in order to get the output. In other words, multiplying the input X with each of the tf.Variable in your code.
  3. Loss: The mismatch between the obtained output in step 2 and the expected output.
  4. Computing the gradients: Computing the gradients for each of the tf.Variable (weights) with respect to the loss.
  5. Updating the weights: Updating each tf.Variable (weight) according to its corresponding gradient.

Please note that step 4 and 5 encapsulate backpropagation.


推荐阅读