python - Tensorflow Linear Regression NaN output
问题描述
I am trying to write code for a Machine Learning algorithm to learn both Machine Learning concepts and Tensorflow. The algorithm I am trying to write up is:
(Not enough reputation to embed an image) https://i.imgur.com/lxgC7YV.png
"Which is equivalent to a piece wise linear regression model."
From (Equation 7):
https://arxiv.org/pdf/1411.3315.pdf
I've loaded in the vectors I want to do this on. And initialised my placeholders and variables:
size = len(originalVecs)
_x1 = tf.placeholder(tf.float64, shape=[size, 300], name="x1-input")
_x2 = tf.placeholder(tf.float64, shape=[size, 300], name="x2-input")
_w = tf.Variable(tf.random_uniform([300,300], -1, 1, dtype = tf.float64), name="weight1")
My prediction, cost, and training step I've set as I have set as:
prediction = tf.matmul(_x1,_w)
cost = tf.reduce_sum(tf.square(tf.norm(prediction - _x2)))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
After I initialize I train with the following:
for i in range(10000):
sess.run(train_step, feed_dict={_x1: timedVecs, _x2 : originalVecs})
if i % 1001 == 0:
print('Epoch ', i)
print('Prediction ', sess.run(prediction, feed_dict={_x1: timedVecs, _x2 : originalVecs}).shape)
When I run my code it is wildly unstable and just grows within about 20 iterations to just print NaNs. I think I'm doing a couple of things wrong but I do not know how to correct.
The shape of the prediction is [20,300] when I would expect it to be [1,300]. I want it to predict based off a single x1 and x2, rather than all at once then try to learn from the sum of the error for all data points (what I assume piecewise is). I'm not sure how to go about this however as I think currently I'm minimising based on the 20,300 matrix rather than the sum of 20 1,300 matrices.
I assume matmul is correct as multiply is element wise?
I am entering my input data as a list of np arrays. Each np array being a data point with 300 dimensions.
Thank you.
解决方案
一般来说,我会避免损失的平方根。问题是 的导数x**0.5
是0.5 * x**-0.5
,这意味着除以x
。如果x
永远为零,这将产生 NaN。在这种情况下,平方根来自tf.norm
并紧随其后的是tf.square
,但这些操作不会融合在一起,也不会取消。
将损失表达式简化为tf.reduce_sum(tf.square(prediction - _x2))
应该使事情更稳定。
推荐阅读
- sql - 在 pl/sql 中递增和比较版本
- java - Java泛型方法T和T之间的区别
在方法声明中? - c# - C# foreach 加入还是 groupby?我想得到一个总数
- vb.net - 解析 XML 文件时,“元素”是无效的 XmlNodeType 错误
- asp.net-mvc - 设置多语言验证错误信息 asp.net mvc
- django - 为什么 django 应用程序的 nginX 服务器错误(500)?
- jenkins - 将字符串参数传递给 Jenkins 声明性脚本
- arduino - COPY NV BLOCK [E904h] 的 MAX17205(电量计)问题
- java - java中包含0的反转整数
- java - Spring Junit 测试实体未保存到存储库