tensorflow - Keras 和 scikit-learn 的 MLP 结果完全不同
问题描述
在 MNIST 上运行单个隐藏层 MLP,我得到的 Keras 和 sklearn 的结果截然不同。
import numpy as np
np.random.seed(5)
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras import regularizers
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn.neural_network import MLPClassifier
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_classes = 10
batch_data = x_train[:2000]
batch_labels = y_train[:2000]
# flat 2d images
batch_data_flat = batch_data.reshape(2000, 784)
# one-hot encoding
batch_labels_one_hot = np_utils.to_categorical(batch_labels, num_classes)
num_hidden_nodes = 100
alpha = 0.0001
batch_size = 128
beta_1 = 0.9
beta_2 = 0.999
epsilon = 1e-08
learning_rate_init = 0.001
epochs = 200
# keras
keras_model = Sequential()
keras_model.add(Dense(num_hidden_nodes, activation='relu',
kernel_regularizer=regularizers.l2(alpha),
kernel_initializer='glorot_uniform',
bias_initializer='glorot_uniform'))
keras_model.add(Dense(num_classes, activation='softmax',
kernel_regularizer=regularizers.l2(alpha),
kernel_initializer='glorot_uniform',
bias_initializer='glorot_uniform'))
keras_optim = Adam(lr=learning_rate_init, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)
keras_model.compile(optimizer=keras_optim, loss='categorical_crossentropy', metrics=['accuracy'])
keras_model.fit(batch_data_flat, batch_labels_one_hot, batch_size=batch_size, epochs=epochs, verbose=0)
# sklearn
sklearn_model = MLPClassifier(hidden_layer_sizes=(num_hidden_nodes,), activation='relu', solver='adam',
alpha=alpha, batch_size=batch_size, learning_rate_init=learning_rate_init,
max_iter=epochs, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)
sklearn_model.fit(batch_data_flat, batch_labels_one_hot)
# evaluate both on their training data
score_keras = keras_model.evaluate(batch_data_flat, batch_labels_one_hot)
score_sklearn = sklearn_model.score(batch_data_flat, batch_labels_one_hot)
print("Acc: keras %f, sklearn %f" % (score_keras[1], score_sklearn))
输出:Acc: keras 0.182500, sklearn 1.000000
我看到的唯一区别是 scikit-learn 计算最终层的 Glorot 初始化与 Keras 的sqrt(2 / (fan_in + fan_out))
初始化sqrt(6 / (fan_in + fan_out))
。但我认为这不应该造成这样的差异。我在这里忘记了什么吗?
scikit-learn 0.19.1、Keras 2.2.0(后端 TensorFlow 1.9.0)
解决方案
您可能应该使用“zeros”而不是“glorot_uniform”来初始化偏差。
推荐阅读
- ruby-on-rails - Rails Postgres Github Actions 错误:PG::ConnectionBad:fe_sendauth:未提供密码
- javascript - TypeError:无法使用 react-redux 读取未定义的属性“地图”
- javascript - 在每个 12 位数字 js 之前插入一个换行符
- flutter - 在类型定义中使用泛型时 Dart 抛出运行时异常
- tensorflow - 拟合具有多个输入的模型
- r - 对每个对象应用相同功能后自分配对象
- php - 将 WooCommerce 类别重定向到页面
- .htaccess - 使用正则表达式将 301 重定向到具有不同编号的新 url
- android - “地图插件尚不支持TargetPlatform.fuchsia”使用google_maps_flutter显示此文本而不是google map
- exoplayer2.x - Exoplayer 中的 MediaItem 和 MediaSource 有什么区别?