python - 同一网络架构的两种不同风格的 Tensorflow 实现会导致两种不同的结果和行为?
问题描述
- 操作系统平台:Linux Centos 7.6
- 分发:英特尔至强金牌 6152 (22x3.70 GHz);
- GPU 型号:NVIDIA Tesla V100 32 GB;
- 节点数/CPU/Cores/GPU:26/52/1144/104;
- TensorFlow 安装自(源码或二进制):官方网页
- TensorFlow 版本(使用下面的命令):2.1.0
- Python版本:3.6.8
问题描述:
当我使用第二种实现方式(见下文)实现我提出的方法时,我意识到算法的性能确实很奇怪。更准确地说,随着 epoch 数量的增加,准确率降低,损失值增加。
所以我缩小了问题的范围,最后,我决定从 TensorFlow 官方页面修改一些代码来检查发生了什么。正如 TF v2 官方网页中解释的那样,我采用了两种实现方式,如下所示。
我修改了“TF v2 入门”中提供的代码,链接如下:
如下:
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)
one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
V = x_train.shape[1]
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(n_classes)
])
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
输出如预期的那样,如下所示:
600/600 [==============================] - 0s 419us/sample - loss: 0.7114 - accuracy: 0.5350
Epoch 2/5
600/600 [==============================] - 0s 42us/sample - loss: 0.6149 - accuracy: 0.6050
Epoch 3/5
600/600 [==============================] - 0s 39us/sample - loss: 0.5450 - accuracy: 0.6925
Epoch 4/5
600/600 [==============================] - 0s 46us/sample - loss: 0.4895 - accuracy: 0.7425
Epoch 5/5
600/600 [==============================] - 0s 40us/sample - loss: 0.4579 - accuracy: 0.7825
test: 200/200 - 0s - loss: 0.4110 - accuracy: 0.8350
更准确地说,随着 epoch 数量的增加,训练准确率增加,损失值减少(这是预期的,也是正常的)。
但是,以下代码块改编自以下链接:
如下:
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)
one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
training_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(batch_size)
testing_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)
V = x_train.shape[1]
class MyModel(tf.keras.models.Model):
def __init__(self):
super(MyModel, self).__init__()
self.d1 = tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,))
self.d2 = tf.keras.layers.Dropout(0.2)
self.d3 = tf.keras.layers.Dense(n_classes,)
def call(self, x):
x = self.d1(x)
x = self.d2(x)
return self.d3(x)
# Create an instance of the model
model = MyModel()
loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.BinaryCrossentropy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.BinaryCrossentropy(name='test_accuracy')
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images,) # training=True
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
@tf.function
def test_step(images, labels):
# training=False is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images,) # training=False
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
EPOCHS = 5
for epoch in range(EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in training_dataset:
train_step(images, labels)
for test_images, test_labels in testing_dataset:
test_step(test_images, test_labels)
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch + 1,train_loss.result(), train_accuracy.result(), test_loss.result(), test_accuracy.result()))
行为确实很奇怪。这是这段代码的输出:
Epoch 1, Loss: 0.7299721837043762, Accuracy: 3.8341376781463623, Test Loss: 0.7290592193603516, Test Accuracy: 3.6925911903381348
Epoch 2, Loss: 0.6725851893424988, Accuracy: 3.1141700744628906, Test Loss: 0.6695905923843384, Test Accuracy: 3.2315549850463867
Epoch 3, Loss: 0.6256862878799438, Accuracy: 2.75959849357605, Test Loss: 0.6216427087783813, Test Accuracy: 2.920461416244507
Epoch 4, Loss: 0.5873140096664429, Accuracy: 2.4249706268310547, Test Loss: 0.5828182101249695, Test Accuracy: 2.575272560119629
Epoch 5, Loss: 0.555053174495697, Accuracy: 2.2128372192382812, Test Loss: 0.5501811504364014, Test Accuracy: 2.264410972595215
可以看出,不仅准确率的值很奇怪,而且不增加,一旦 epoch 的数量增加,它们就会减少?
你能解释一下这里发生了什么吗?
解决方案
正如评论中指出的那样,我在使用评估指标时犯了错误。我应该使用 BinaryAccuracy。
此外,最好将高级版本中的调用编辑如下:
def call(self, x, training=False):
x = self.d1(x)
if training:
x = self.d2(x, training=training)
return self.d3(x)
推荐阅读
- ios - 如何使用目标 C 进行 json 调用
- r - 如何替换R中的转义字符?
- python - 如何在 python PyQt5 中运行儿童 UI?
- google-cloud-platform - 在 GKE、terraform 与控制台中创建私有集群
- java - Java如何在超时后杀死线程而不检查中断
- python-3.x - 从python中的数字周数和季度内的周数获取季度数?
- scala - 如何在Scala中替换/屏蔽地图值的前n位/字符串
- kubernetes - 如何从一个集群中获取 Kubernetes 机密以应用于另一个集群?
- python - 将选定的行导出到python中的列表中
- python - 为什么我不能从字符串中取出一个 10 位数字并用它在字典中查找相应的键?