首页 > 解决方案 > 就批量大小和时期而言,神经网络中的学习流程是怎样的?

问题描述

我对神经网络中关于批量大小、时期和过程中权重分布等术语的情况感到困惑。

我想按照以下顺序验证我对流程的理解是否有效?

Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.

执行第一个 epoch

Executing first batch

    Data point-1 :8 feature's values go through the 8 input nodes.
        Random weights are initialised
        Forward Propagation happens
        Backward Propagation happens
        The result of backward propagation-all the weights are updated.

    Data point-2 :8 feature's values go through the 8 input nodes.
        Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
        Backward propagation happens and all the weights are again updated.

Executing second batch

    Data point-3 :8 features go through the 8 input nodes.
        Forward propagation happens with the updated nodes found from the previous(aka  Data point-2) back propagation result
        Backward propagation happens and all the weights are again updated.

This process continues………….until the first epoch ends

执行第二个 epoch

Executing the second batch
    Data point-1: 8 feature's value go through 8 input nodes.
        No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
        Backward propagation happens and all the weights are again updated.

This process continues.. until the second epoch ends.

该过程一直持续到所需的时期。

标签: tensorflowmachine-learningtheano

解决方案


处理是错误的mini-batch:对于一个批次,我们一次计算整个批次的梯度,然后我们对所有梯度求和,然后每批次更新一次权重。

这是说明d(loss)/d(W)简单示例的梯度计算的代码:y = W * x对于 amini-batch和 asingle输入:

X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])

W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)

loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]})) 
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))  
#[[-0.024]]

#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))   
# [[-0.03]] which is the sum of the above gradients.

推荐阅读