首页 > 解决方案 > Tensorflow Linear Regression NaN output

问题描述

I am trying to write code for a Machine Learning algorithm to learn both Machine Learning concepts and Tensorflow. The algorithm I am trying to write up is:

(Not enough reputation to embed an image) https://i.imgur.com/lxgC7YV.png

"Which is equivalent to a piece wise linear regression model."

From (Equation 7):

https://arxiv.org/pdf/1411.3315.pdf

I've loaded in the vectors I want to do this on. And initialised my placeholders and variables:

size = len(originalVecs)
_x1 = tf.placeholder(tf.float64, shape=[size, 300], name="x1-input")
_x2 = tf.placeholder(tf.float64, shape=[size, 300], name="x2-input")

_w = tf.Variable(tf.random_uniform([300,300], -1, 1, dtype = tf.float64), name="weight1")

My prediction, cost, and training step I've set as I have set as:

prediction = tf.matmul(_x1,_w)
cost = tf.reduce_sum(tf.square(tf.norm(prediction - _x2)))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

After I initialize I train with the following:

for i in range(10000):
    sess.run(train_step, feed_dict={_x1: timedVecs, _x2 : originalVecs})
    if i % 1001 == 0:
        print('Epoch ', i)
        print('Prediction ', sess.run(prediction, feed_dict={_x1: timedVecs, _x2 : originalVecs}).shape)

When I run my code it is wildly unstable and just grows within about 20 iterations to just print NaNs. I think I'm doing a couple of things wrong but I do not know how to correct.

The shape of the prediction is [20,300] when I would expect it to be [1,300]. I want it to predict based off a single x1 and x2, rather than all at once then try to learn from the sum of the error for all data points (what I assume piecewise is). I'm not sure how to go about this however as I think currently I'm minimising based on the 20,300 matrix rather than the sum of 20 1,300 matrices.

I assume matmul is correct as multiply is element wise?

I am entering my input data as a list of np arrays. Each np array being a data point with 300 dimensions.

Thank you.

标签: pythontensorflowregressionlinear-regression

解决方案


一般来说,我会避免损失的平方根。问题是 的导数x**0.50.5 * x**-0.5,这意味着除以x。如果x永远为零,这将产生 NaN。在这种情况下,平方根来自tf.norm并紧随其后的是tf.square,但这些操作不会融合在一起,也不会取消。

将损失表达式简化为tf.reduce_sum(tf.square(prediction - _x2))应该使事情更稳定。


推荐阅读