首页 > 解决方案 > 在 Tensorflow Keras 模型中手动计算 tanh 导致 Nan

问题描述

请在下面找到TF Keras Modeltanh activation functionHidden Layers.

虽然 Logits 的值是正确的,但通过tanh function手动实现计算的值是Nan.

这可能是因为Runtime下面显示的警告:

/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76:RuntimeWarning:exp中遇到溢出

/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76:RuntimeWarning:在true_divide中遇到无效值

完整的可重现代码如下:

import tensorflow as tf
import numpy as np


inputs = tf.keras.Input(shape=(784,), name="digits")
x1 = tf.keras.layers.Dense(64, activation="tanh")(inputs)
x2 = tf.keras.layers.Dense(64, activation="tanh")(x1)
outputs = tf.keras.layers.Dense(10, name="predictions")(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))

# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        
        x_batch_train = tf.cast(x_batch_train, tf.float32)
                
        with tf.GradientTape() as tape:
            
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)
        
        grads = tape.gradient(loss_value, model.trainable_weights)
        
        Initial_Weights_1st_Hidden_Layer = model.trainable_weights[0]
        
        Initial_Weights_2nd_Hidden_Layer = model.trainable_weights[2]
        
        Initial_Weights_Output_Layer = model.trainable_weights[4]
                
        Initial_Bias_1st_Hidden_Layer = model.trainable_weights[1]
        
        Initial_Bias_2nd_Hidden_Layer = model.trainable_weights[3]
        
        Initial_Bias_Output_Layer = model.trainable_weights[5]
        
        # Implementing Relu Activation Function using Numpy
        def Tanh_Activation(Input):
            return ((np.exp(Input)-np.exp(-Input))/(np.exp(Input)+np.exp(-Input)))
                    
        # Calculations
        Input_to_1st_Hidden_Layer = x_batch_train @ Initial_Weights_1st_Hidden_Layer + \
                                                                        Initial_Bias_1st_Hidden_Layer
                     
        Output_Of_1st_Hidden_Layer = Tanh_Activation(Input_to_1st_Hidden_Layer)
        
        Input_to_2nd_Hidden_Layer = Output_Of_1st_Hidden_Layer @ Initial_Weights_2nd_Hidden_Layer + \
                                                                        Initial_Bias_2nd_Hidden_Layer
                   
        Output_Of_2nd_Hidden_Layer = Tanh_Activation(Input_to_2nd_Hidden_Layer)
      
        Input_to_Final_Layer = Output_Of_2nd_Hidden_Layer @ Initial_Weights_Output_Layer + \
                                                                        Initial_Bias_Output_Layer
        
        # No Activation Function has been used in the Output/Final Layer
        Calculated_Y_Pred = Input_to_Final_Layer

        # Log every 200 batches.
        if step == 200:      
            print('\n Y_Pred = ', logits[0:2])
            print('\n Calculated_Y_Pred = ', Calculated_Y_Pred[0:2])

输出如下所示:

Start of epoch 0
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76: RuntimeWarning: overflow encountered in exp
/home/abc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76: RuntimeWarning: invalid value encountered in true_divide

Y_Pred =  tf.Tensor(
[[ 0.21055318 -0.22218612 -0.16623776  1.4846183  -0.85814655 -0.54121417
  -0.64886147 -0.16928624 -0.07040396  1.2235574 ]
 [-0.37760752  0.72542065 -0.13288006  0.26616174 -0.00855861  0.00906155
   0.72031933  1.1708878   1.0362617  -0.9381638 ]], shape=(2, 10), dtype=float32)

 Calculated_Y_Pred =  tf.Tensor(
[[nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]], shape=(2, 10), dtype=float32)

Start of epoch 1

 Y_Pred =  tf.Tensor(
[[ 1.3311301  -0.63776755 -0.99189854  0.04636261 -1.4317334  -0.261448
  -0.5955114   0.60205513 -1.1979251   0.08551253]
 [ 1.150329    0.10347857 -0.25470468  0.7521076  -1.4897512   0.15557133
  -0.9681883   0.45576736  0.56690776  0.2748596 ]], shape=(2, 10), dtype=float32)

 Calculated_Y_Pred =  tf.Tensor(
[[nan nan nan nan nan nan nan nan nan nan]
 [nan nan nan nan nan nan nan nan nan nan]], shape=(2, 10), dtype=float32)

标签: pythonnumpytensorflowkeras

解决方案


规范化解决了溢出的问题:

x_train = np.reshape(x_train, (-1, 784)) / 255.0
x_test = np.reshape(x_test, (-1, 784)) / 255.0

请注意,计算结果之间仍然存在非常小的差异,因为此数据集是分批处理的。


推荐阅读