tensorflow - 就批量大小和时期而言,神经网络中的学习流程是怎样的?
问题描述
我对神经网络中关于批量大小、时期和过程中权重分布等术语的情况感到困惑。
我想按照以下顺序验证我对流程的理解是否有效?
Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.
执行第一个 epoch
Executing first batch
Data point-1 :8 feature's values go through the 8 input nodes.
Random weights are initialised
Forward Propagation happens
Backward Propagation happens
The result of backward propagation-all the weights are updated.
Data point-2 :8 feature's values go through the 8 input nodes.
Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
Backward propagation happens and all the weights are again updated.
Executing second batch
Data point-3 :8 features go through the 8 input nodes.
Forward propagation happens with the updated nodes found from the previous(aka Data point-2) back propagation result
Backward propagation happens and all the weights are again updated.
This process continues………….until the first epoch ends
执行第二个 epoch
Executing the second batch
Data point-1: 8 feature's value go through 8 input nodes.
No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
Backward propagation happens and all the weights are again updated.
This process continues.. until the second epoch ends.
该过程一直持续到所需的时期。
解决方案
处理是错误的mini-batch
:对于一个批次,我们一次计算整个批次的梯度,然后我们对所有梯度求和,然后每批次更新一次权重。
这是说明d(loss)/d(W)
简单示例的梯度计算的代码:y = W * x
对于 amini-batch
和 asingle
输入:
X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])
W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)
loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]}))
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))
#[[-0.024]]
#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))
# [[-0.03]] which is the sum of the above gradients.
推荐阅读
- xamarin.forms - Xamarin Forms:在 TimePicker 中关闭时钟弹出后,UI 上显示的时间不同?
- apache-flink - 当我将 jar 提交给 Flink 时的 NPE
- python - 编写一个函数,将一个矩阵和一个标量乘数作为输入,并返回一个包含标量和矩阵的标量积的新矩阵
- reactjs - 从图像 ReactJs 中去除所有 EXIF 和其他元数据
- excel - 将下拉选项组织到不同的 Excel 表格中
- java - 从 Firebase 导入数据不会出现在文本视图中
- html - 引导如何将内容与导航栏对齐
- javascript - 在ajax调用后获取Checkboxfor选中的值不起作用
- flutter - 在 Flutter 中分离 UI 和逻辑
- pandas - 将动态数量的变量分配给数据框列值?