首页 > 解决方案 > Keras 和 scikit-learn 的 MLP 结果完全不同

问题描述

在 MNIST 上运行单个隐藏层 MLP,我得到的 Keras 和 sklearn 的结果截然不同。

import numpy as np
np.random.seed(5)
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras import regularizers
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn.neural_network import MLPClassifier

(x_train, y_train), (x_test, y_test) = mnist.load_data()

num_classes = 10
batch_data = x_train[:2000]
batch_labels = y_train[:2000]

# flat 2d images
batch_data_flat = batch_data.reshape(2000, 784)

# one-hot encoding
batch_labels_one_hot = np_utils.to_categorical(batch_labels, num_classes)

num_hidden_nodes = 100
alpha = 0.0001
batch_size = 128
beta_1 = 0.9
beta_2 = 0.999
epsilon = 1e-08
learning_rate_init = 0.001
epochs = 200

# keras
keras_model = Sequential()
keras_model.add(Dense(num_hidden_nodes, activation='relu',
                      kernel_regularizer=regularizers.l2(alpha),
                      kernel_initializer='glorot_uniform',
                      bias_initializer='glorot_uniform'))
keras_model.add(Dense(num_classes, activation='softmax',
                      kernel_regularizer=regularizers.l2(alpha),
                      kernel_initializer='glorot_uniform',
                      bias_initializer='glorot_uniform'))

keras_optim = Adam(lr=learning_rate_init, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)
keras_model.compile(optimizer=keras_optim, loss='categorical_crossentropy', metrics=['accuracy'])

keras_model.fit(batch_data_flat, batch_labels_one_hot, batch_size=batch_size, epochs=epochs, verbose=0)

# sklearn
sklearn_model = MLPClassifier(hidden_layer_sizes=(num_hidden_nodes,), activation='relu', solver='adam',
                              alpha=alpha, batch_size=batch_size, learning_rate_init=learning_rate_init,
                              max_iter=epochs, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)

sklearn_model.fit(batch_data_flat, batch_labels_one_hot)

# evaluate both on their training data
score_keras = keras_model.evaluate(batch_data_flat, batch_labels_one_hot)
score_sklearn = sklearn_model.score(batch_data_flat, batch_labels_one_hot)
print("Acc: keras %f, sklearn %f" % (score_keras[1], score_sklearn))

输出:Acc: keras 0.182500, sklearn 1.000000

我看到的唯一区别是 scikit-learn 计算最终层的 Glorot 初始化与 Keras 的sqrt(2 / (fan_in + fan_out))初始化sqrt(6 / (fan_in + fan_out))。但我认为这不应该造成这样的差异。我在这里忘记了什么吗?

scikit-learn 0.19.1、Keras 2.2.0(后端 TensorFlow 1.9.0)

标签: tensorflowscikit-learnkeras

解决方案


您可能应该使用“zeros”而不是“glorot_uniform”来初始化偏差。


推荐阅读