首页 > 解决方案 > Training in batch does not converge while training samples individually does

问题描述

I'm learning Tensorflow and trying some test models to get the feeling how Tensorflow works. The model is pretty simple (a linear model) and it has the following input and output:

X = tf.placeholder(tf.float32, shape=(None, 1), name="Input")
Y = tf.placeholder(tf.float32, shape=(None, 1), name="Output")

So basically, the input and output have one single dimension. As for training phase:

for epoch in range(training_epochs):
  for (x, y) in zip(trX, trY):
    sess.run(train_op, feed_dict={X: [x], Y: [y]})

The above code is when I train the model one single sample at a time and it works. As for my next step, I wanted to see how I can train the model with the whole set at each epoch:

for epoch in range(training_epochs):
  sess.run(train_op, feed_dict={X: trX, Y: trY})

Training this way does not converge. And trX & trY are both of shape (101, 1). So is my expectation not in place or am I doing something wrong here?

标签: pythontensorflow

解决方案


事实证明,问题出在成本函数定义中。最初,成本函数是这样定义的:

cost = tf.pow(Y - y_model, 2)

我意识到它缺少平均分量(前一个是返回一个矩阵而不是单个值)。

cost = tf.reduce_mean(tf.pow(Y - y_model, 2))

推荐阅读