首页 > 解决方案 > 同一网络架构的两种不同风格的 Tensorflow 实现会导致两种不同的结果和行为?

问题描述

问题描述:

当我使用第二种实现方式(见下文)实现我提出的方法时,我意识到算法的性能确实很奇怪。更准确地说,随着 epoch 数量的增加,准确率降低,损失值增加。

所以我缩小了问题的范围,最后,我决定从 TensorFlow 官方页面修改一些代码来检查发生了什么。正如 TF v2 官方网页中解释的那样,我采用了两种实现方式,如下所示。

如下:

import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80


# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5], 
flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)

x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)

one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')

x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)

V = x_train.shape[1]

model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(n_classes)
    ])

loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test, verbose=2)

输出如预期的那样,如下所示:

600/600 [==============================] - 0s 419us/sample - loss: 0.7114 - accuracy: 0.5350
Epoch 2/5
600/600 [==============================] - 0s 42us/sample - loss: 0.6149 - accuracy: 0.6050
Epoch 3/5
600/600 [==============================] - 0s 39us/sample - loss: 0.5450 - accuracy: 0.6925
Epoch 4/5
600/600 [==============================] - 0s 46us/sample - loss: 0.4895 - accuracy: 0.7425
Epoch 5/5
600/600 [==============================] - 0s 40us/sample - loss: 0.4579 - accuracy: 0.7825

test: 200/200 - 0s - loss: 0.4110 - accuracy: 0.8350

更准确地说,随着 epoch 数量的增加,训练准确率增加,损失值减少(这是预期的,也是正常的)。

但是,以下代码块改编自以下链接:

TensorFlow 2 专家快速入门

如下:

import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80

# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)

x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)

one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')

x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)

print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)

training_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(batch_size)

testing_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

V = x_train.shape[1]


class MyModel(tf.keras.models.Model):
        def __init__(self):
            super(MyModel, self).__init__()
            self.d1 = tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,))
            self.d2 = tf.keras.layers.Dropout(0.2)
            self.d3 = tf.keras.layers.Dense(n_classes,)

        def call(self, x):
            x = self.d1(x)
            x = self.d2(x)
            return self.d3(x)

# Create an instance of the model
model = MyModel()

loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)

optimizer = tf.keras.optimizers.Adam()

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.BinaryCrossentropy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.BinaryCrossentropy(name='test_accuracy')


@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        # training=True is only needed if there are layers with different
        # behavior during training versus inference (e.g. Dropout).
        predictions = model(images,)  # training=True
        loss = loss_object(labels, predictions)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))

        train_loss(loss)
        train_accuracy(labels, predictions)


@tf.function
def test_step(images, labels):
# training=False is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images,)  # training=False
t_loss = loss_object(labels, predictions)

test_loss(t_loss)
test_accuracy(labels, predictions)


EPOCHS = 5

for epoch in range(EPOCHS):
# Reset the metrics at the start of the next epoch
    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()

for images, labels in training_dataset:
    train_step(images, labels)
    for test_images, test_labels in testing_dataset:
        test_step(test_images, test_labels)
   
   template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
   print(template.format(epoch + 1,train_loss.result(), train_accuracy.result(),                           test_loss.result(), test_accuracy.result()))

行为确实很奇怪。这是这段代码的输出:

Epoch 1, Loss: 0.7299721837043762, Accuracy: 3.8341376781463623, Test Loss: 0.7290592193603516, Test Accuracy: 3.6925911903381348
Epoch 2, Loss: 0.6725851893424988, Accuracy: 3.1141700744628906, Test Loss: 0.6695905923843384, Test Accuracy: 3.2315549850463867
Epoch 3, Loss: 0.6256862878799438, Accuracy: 2.75959849357605, Test Loss: 0.6216427087783813, Test Accuracy: 2.920461416244507
Epoch 4, Loss: 0.5873140096664429, Accuracy: 2.4249706268310547, Test Loss: 0.5828182101249695, Test Accuracy: 2.575272560119629
Epoch 5, Loss: 0.555053174495697, Accuracy: 2.2128372192382812, Test Loss: 0.5501811504364014, Test Accuracy: 2.264410972595215

可以看出,不仅准确率的值很奇怪,而且不增加,一旦 epoch 的数量增加,它们就会减少?

你能解释一下这里发生了什么吗?

标签: pythontensorflowmachine-learningkeras

解决方案


正如评论中指出的那样,我在使用评估指标时犯了错误。我应该使用 BinaryAccuracy。

此外,最好将高级版本中的调用编辑如下:

def call(self, x, training=False):
    x = self.d1(x)
    if training:
        x = self.d2(x, training=training)
    return self.d3(x)

推荐阅读