首页 > 解决方案 > 使用 Keras Model API 和 TensorFlow Estimator API 训练相同的模型会产生不同的准确度

问题描述

我最近一直在试验 TensorFlow 的更高级别的 API,并得到了一些奇怪的结果:当我使用 Keras 模型 API 和 TensorFlow Estimator API 训练一个看似完全相同且具有相同超参数的模型时,我得到了不同的结果(使用 Keras 导致 ~4%精度更高)。

这是我的代码:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Activation, Flatten
from tensorflow.keras.initializers import VarianceScaling
from tensorflow.keras.optimizers import Adam

# Load CIFAR-10 dataset and normalize pixel values
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32).reshape(-1)
X_test = np.array(X_test, dtype=np.float32)
y_test = np.array(y_test, dtype=np.int32).reshape(-1)
mean = X_train.mean(axis=(0, 1, 2), keepdims=True)
std = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)



# Define forward pass for a convolutional neural network.
# This function takes a batch of images as input and returns
# unscaled class scores (aka logits) from the last layer
def conv_net(X):
    initializer = VarianceScaling(scale=2.0)

    X = Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = MaxPooling2D()(X)

    X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Conv2D(filters=256, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = GlobalAveragePooling2D()(X)

    X = Dense(10)(X)

    return X



# For training this model I use Adam optimizer with learning_rate=1e-3

# Train the model for 10 epochs using keras.Model API
def keras_model():
    inputs = Input(shape=(32,32,3))

    scores = conv_net(inputs)
    outputs = Activation('softmax')(scores)

    model = Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer=Adam(lr=3e-3), 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])

    return model

model1 = keras_model()
model1.fit(X_train, y_train_one_hot, batch_size=128, epochs=10)
results1 = model1.evaluate(X_test, y_test_one_hot)
print(results1)
# The above usually gives 79-82% accuracy




# Now train the same model for 10 epochs using tf.estimator.Estimator API
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train, \
                                                    batch_size=128, num_epochs=10, shuffle=True)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test, \
                                                   batch_size=128, num_epochs=1, shuffle=False)


def tf_estimator(features, labels, mode, params):
    X = features['X']

    scores = conv_net(X)

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions={'scores': scores})

    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=labels)

    metrics = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(scores, axis=-1))}

    optimizer = tf.train.AdamOptimizer(learning_rate=params['lr'], epsilon=params['epsilon'])
    step = optimizer.minimize(loss, global_step=tf.train.get_global_step())

    return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(loss), train_op=step, eval_metric_ops=metrics)


model2 = tf.estimator.Estimator(model_fn=tf_estimator, params={'lr': 3e-3, 'epsilon': tf.keras.backend.epsilon()})
model2.train(input_fn=train_input_fn)
results2 = model2.evaluate(input_fn=test_input_fn)
print(results2)
# This usually gives 75-78% accuracy

print('Keras accuracy:', results1[1])
print('Estimator accuracy:', results2['accuracy'])

我已经对这两个模型进行了 30 次训练,每次 10 个 epoch:使用 Keras 训练的模型的平均准确率为 0.8035,使用 Estimator 训练的模型的平均准确率为 0.7631(标准差分别为 0.0065 和 0.0072)。如果我使用 Keras,准确率会显着提高。我的问题是为什么会发生这种情况?我做错了什么或遗漏了一些重要参数吗?在这两种情况下,模型的架构是相同的,并且我使用相同的超参数(我什至将 Adam 的 epsilon 设置为相同的值,尽管它并不会真正影响整体结果),但准确度却有很大不同。

我还使用原始 TensorFlow 编写了训练循环,并获得了与 Estimator API 相同的准确度(低于我使用 Keras 获得的准确度)。这让我觉得 Keras 中某些参数的默认值与 TensorFlow 不同,但实际上它们似乎都相同。

我也尝试过其他架构,有时我的精度差异较小,但我无法找到导致差异的任何特定层类型。看起来如果我使用更浅的网络,差异通常会变小。然而,并非总是如此。例如,以下模型的精度差异甚至更大:

def simple_conv_net(X):
    initializer = VarianceScaling(scale=2.0)

    X = Conv2D(filters=32, kernel_size=5, strides=2, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
    X = BatchNormalization()(X)

    X = Flatten()(X)
    X = Dense(10)(X)

    return X

同样,我使用 Adam 优化器以 3e-3 的学习率对其进行了 10 个 epoch 30 次的训练。Keras 的平均准确度为 0.6561,Estimator 的平均准确度为 0.6101(标准差分别为 0.0084 和 0.0111)。是什么导致了这种差异?

标签: tensorflowkerasconv-neural-networktensorflow-estimator

解决方案


推荐阅读