首页 > 解决方案 > Back propogation on multi layered neural networks

问题描述

I am making a neural network system with c# without using any libraries or Accord.Net. But i am stuck on how to back propogate my error. Do i have to include all layers that i have already propagated to the next layer, or only the previous layer gets in the equation?

Edit for more information:

My network structure is mostly dynamic. It creates a neural network with user input with how many layers and node count per layer. It has input and output layer created based on the dataset used. It can use linear, sigmoid, tanh or relu activation functions on layers, and you can mix match them per layer.

I do understand how backpropogation works and its use. But every example i see use it on only 3 layered structures with 1 input, 1 hidden and 1 output layer. They calculate output layer error and update its weights. Then they calculate hidden layer's error with output layer included.

My problem starts here. They dont show as if only the layer before the hidden layer (thinking as you go right to left for back propogation) is included, or all the layer till output layers are included in the error equation.

For visualization

input layer ---> hidden layer 1 ---> hidden layer 2 ---> output layer

In this example, when i calculate the hidden layer1's error and weight update, do i only include hidden layer 2, or hidden layer 2 + output layer?

标签: c#neural-networkdeep-learningbackpropagation

解决方案


I wonder what you mean by "include". Backpropagation is supposed to compute the gradient. The gradient is the derivative of each variable against the loss function (you call this the error, but that term is not quite precise. It's not an error, it's a slope). After the gradient is computed all parameters (the "weights") are updated at once.

Computing the gradient essentially is numeric differentiation. If you have a * b = c, you have all a, b and c and gradient(c), then it is easy to compute the gradient for a and b as well (gradient(a) = b * gradient(c)).

So you push the gradient layer by layer backwards. For each layer you only need the gradient of the next layer. Frameworks such as TensorFlow do this automatically for you. The technique works for any computational graph, not just for neural networks of the structure you described. Understanding the general concept of numeric differentiation along a computational graph first makes it easy to understand the special case of a neural network.


推荐阅读