tensorflow - 使用 Keras Model API 和 TensorFlow Estimator API 训练相同的模型会产生不同的准确度
问题描述
我最近一直在试验 TensorFlow 的更高级别的 API,并得到了一些奇怪的结果:当我使用 Keras 模型 API 和 TensorFlow Estimator API 训练一个看似完全相同且具有相同超参数的模型时,我得到了不同的结果(使用 Keras 导致 ~4%精度更高)。
这是我的代码:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Activation, Flatten
from tensorflow.keras.initializers import VarianceScaling
from tensorflow.keras.optimizers import Adam
# Load CIFAR-10 dataset and normalize pixel values
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32).reshape(-1)
X_test = np.array(X_test, dtype=np.float32)
y_test = np.array(y_test, dtype=np.int32).reshape(-1)
mean = X_train.mean(axis=(0, 1, 2), keepdims=True)
std = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)
# Define forward pass for a convolutional neural network.
# This function takes a batch of images as input and returns
# unscaled class scores (aka logits) from the last layer
def conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = MaxPooling2D()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=256, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = GlobalAveragePooling2D()(X)
X = Dense(10)(X)
return X
# For training this model I use Adam optimizer with learning_rate=1e-3
# Train the model for 10 epochs using keras.Model API
def keras_model():
inputs = Input(shape=(32,32,3))
scores = conv_net(inputs)
outputs = Activation('softmax')(scores)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=Adam(lr=3e-3),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
model1 = keras_model()
model1.fit(X_train, y_train_one_hot, batch_size=128, epochs=10)
results1 = model1.evaluate(X_test, y_test_one_hot)
print(results1)
# The above usually gives 79-82% accuracy
# Now train the same model for 10 epochs using tf.estimator.Estimator API
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train, \
batch_size=128, num_epochs=10, shuffle=True)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test, \
batch_size=128, num_epochs=1, shuffle=False)
def tf_estimator(features, labels, mode, params):
X = features['X']
scores = conv_net(X)
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions={'scores': scores})
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=labels)
metrics = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(scores, axis=-1))}
optimizer = tf.train.AdamOptimizer(learning_rate=params['lr'], epsilon=params['epsilon'])
step = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(loss), train_op=step, eval_metric_ops=metrics)
model2 = tf.estimator.Estimator(model_fn=tf_estimator, params={'lr': 3e-3, 'epsilon': tf.keras.backend.epsilon()})
model2.train(input_fn=train_input_fn)
results2 = model2.evaluate(input_fn=test_input_fn)
print(results2)
# This usually gives 75-78% accuracy
print('Keras accuracy:', results1[1])
print('Estimator accuracy:', results2['accuracy'])
我已经对这两个模型进行了 30 次训练,每次 10 个 epoch:使用 Keras 训练的模型的平均准确率为 0.8035,使用 Estimator 训练的模型的平均准确率为 0.7631(标准差分别为 0.0065 和 0.0072)。如果我使用 Keras,准确率会显着提高。我的问题是为什么会发生这种情况?我做错了什么或遗漏了一些重要参数吗?在这两种情况下,模型的架构是相同的,并且我使用相同的超参数(我什至将 Adam 的 epsilon 设置为相同的值,尽管它并不会真正影响整体结果),但准确度却有很大不同。
我还使用原始 TensorFlow 编写了训练循环,并获得了与 Estimator API 相同的准确度(低于我使用 Keras 获得的准确度)。这让我觉得 Keras 中某些参数的默认值与 TensorFlow 不同,但实际上它们似乎都相同。
我也尝试过其他架构,有时我的精度差异较小,但我无法找到导致差异的任何特定层类型。看起来如果我使用更浅的网络,差异通常会变小。然而,并非总是如此。例如,以下模型的精度差异甚至更大:
def simple_conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=5, strides=2, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Flatten()(X)
X = Dense(10)(X)
return X
同样,我使用 Adam 优化器以 3e-3 的学习率对其进行了 10 个 epoch 30 次的训练。Keras 的平均准确度为 0.6561,Estimator 的平均准确度为 0.6101(标准差分别为 0.0084 和 0.0111)。是什么导致了这种差异?
解决方案
推荐阅读
- python - Jupyter Notebook - ModuleNotFoundError
- elasticsearch - Elasticsearch 按关键字搜索并提升
- javascript - How to configure Jest testing framework for use in Nuxt.js?
- javascript - Vue - wrap text selection with span and make style two-way binding?
- javafx - 具有 JFX 11 导入的场景生成器
- android - Recycler View 显示带有空值的卡片
- php - php保存数组多维
- multithreading - 如何将 2 个大型 csv 数据集的一列与 col1+col2 作为键进行比较?
- reactjs - react.js 中的关键警告
- python - 为什么python3命令显示找不到它的错误