python - 在 Tensorflow Keras 模型中手动计算 tanh 导致 Nan
问题描述
请在下面找到TF Keras Model
我tanh activation function
在Hidden Layers
.
虽然 Logits 的值是正确的,但通过tanh function
手动实现计算的值是Nan
.
这可能是因为Runtime
下面显示的警告:
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76:RuntimeWarning:exp中遇到溢出
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76:RuntimeWarning:在true_divide中遇到无效值
完整的可重现代码如下:
import tensorflow as tf
import numpy as np
inputs = tf.keras.Input(shape=(784,), name="digits")
x1 = tf.keras.layers.Dense(64, activation="tanh")(inputs)
x2 = tf.keras.layers.Dense(64, activation="tanh")(x1)
outputs = tf.keras.layers.Dense(10, name="predictions")(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
x_batch_train = tf.cast(x_batch_train, tf.float32)
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
Initial_Weights_1st_Hidden_Layer = model.trainable_weights[0]
Initial_Weights_2nd_Hidden_Layer = model.trainable_weights[2]
Initial_Weights_Output_Layer = model.trainable_weights[4]
Initial_Bias_1st_Hidden_Layer = model.trainable_weights[1]
Initial_Bias_2nd_Hidden_Layer = model.trainable_weights[3]
Initial_Bias_Output_Layer = model.trainable_weights[5]
# Implementing Relu Activation Function using Numpy
def Tanh_Activation(Input):
return ((np.exp(Input)-np.exp(-Input))/(np.exp(Input)+np.exp(-Input)))
# Calculations
Input_to_1st_Hidden_Layer = x_batch_train @ Initial_Weights_1st_Hidden_Layer + \
Initial_Bias_1st_Hidden_Layer
Output_Of_1st_Hidden_Layer = Tanh_Activation(Input_to_1st_Hidden_Layer)
Input_to_2nd_Hidden_Layer = Output_Of_1st_Hidden_Layer @ Initial_Weights_2nd_Hidden_Layer + \
Initial_Bias_2nd_Hidden_Layer
Output_Of_2nd_Hidden_Layer = Tanh_Activation(Input_to_2nd_Hidden_Layer)
Input_to_Final_Layer = Output_Of_2nd_Hidden_Layer @ Initial_Weights_Output_Layer + \
Initial_Bias_Output_Layer
# No Activation Function has been used in the Output/Final Layer
Calculated_Y_Pred = Input_to_Final_Layer
# Log every 200 batches.
if step == 200:
print('\n Y_Pred = ', logits[0:2])
print('\n Calculated_Y_Pred = ', Calculated_Y_Pred[0:2])
输出如下所示:
Start of epoch 0
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76: RuntimeWarning: overflow encountered in exp
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76: RuntimeWarning: invalid value encountered in true_divide
Y_Pred = tf.Tensor(
[[ 0.21055318 -0.22218612 -0.16623776 1.4846183 -0.85814655 -0.54121417
-0.64886147 -0.16928624 -0.07040396 1.2235574 ]
[-0.37760752 0.72542065 -0.13288006 0.26616174 -0.00855861 0.00906155
0.72031933 1.1708878 1.0362617 -0.9381638 ]], shape=(2, 10), dtype=float32)
Calculated_Y_Pred = tf.Tensor(
[[nan nan nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan nan nan]], shape=(2, 10), dtype=float32)
Start of epoch 1
Y_Pred = tf.Tensor(
[[ 1.3311301 -0.63776755 -0.99189854 0.04636261 -1.4317334 -0.261448
-0.5955114 0.60205513 -1.1979251 0.08551253]
[ 1.150329 0.10347857 -0.25470468 0.7521076 -1.4897512 0.15557133
-0.9681883 0.45576736 0.56690776 0.2748596 ]], shape=(2, 10), dtype=float32)
Calculated_Y_Pred = tf.Tensor(
[[nan nan nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan nan nan]], shape=(2, 10), dtype=float32)
解决方案
规范化解决了溢出的问题:
x_train = np.reshape(x_train, (-1, 784)) / 255.0
x_test = np.reshape(x_test, (-1, 784)) / 255.0
请注意,计算结果之间仍然存在非常小的差异,因为此数据集是分批处理的。
推荐阅读
- c - char 数组和 putchar 的 strlen
- python - TypeError:'float' 对象不可调用。这是什么意思?
- arrays - Angular - 将表单字段验证规则(例如 firstname.dirty )放入 TypeScript 数组?
- javascript - Flutter - 在 Flutter 应用程序上执行/运行 Facebook 像素代码(Javascript 代码)
- mysql - 多连接查询返回最近发货产品的记录
- linux - 带有“and”和“or”运算符的 Linux 脚本
- python - 在 SHAP 解释器中使用样本权重
- python - 在 Keras 中为多标签文本分类神经网络创建带有 Attention 的 LSTM 层
- python-3.x - 我应该删除我不再打算使用的变量吗?
- reactjs - 功能组件问题 React