python - 即使没有正则化,Keras Loss 和 Metric 中的相同函数也会给出不同的值
问题描述
我正在为语义分割问题构建一个自定义的 u-net,但我看到了一种奇怪的行为,loss
并且metric
在训练期间进行了计算,并且存在非常显着的差异。
在底部更新一个最小的可重现示例:
我读过这一篇 (1),这一篇 (2),另一篇 (3)和另一篇 (4),但还没有找到合适的答案。
在训练模型时,我使用了相同的函数 forloss
和 for metric
,结果差异很大。
第一个例子categorical_cross_entropy
(我使用一个非常小的玩具套装只是为了展示它):
from tensorflow.python.keras import losses
model.compile(optimizer='adam', loss=losses.categorical_crossentropy,
metrics=[losses.categorical_crossentropy])
我得到的输出是:
4/4 [===] - 3s 677ms/step - loss: 4.1023 - categorical_crossentropy: 1.0256
- val_loss: 1.3864 - val_categorical_crossentropy: 1.3864
如您所见,loss和categorical_crossentropy大约是 4 倍。
如果我使用自定义指标,则差异是数量级:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.losses import categorical_crossentropy
def dice_cross_loss(y_true, y_pred, epsilon=1e-6, smooth=1):
ce_loss = categorical_crossentropy(y_true, y_pred)
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
dice_coef = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + epsilon)
return ce_loss - K.log(dice_coef + epsilon)
model.compile(optimizer='adam', loss=dice_cross_loss,
metrics=[dice_cross_loss])
当我运行它时,情况更糟:
4/4 [===] - 3s 682ms/step - loss: 20.9706 - dice_cross_loss: 5.2428
- val_loss: 4.3681 - val_dice_cross_loss: 4.3681
loss
当使用更大的示例时,损失和损失之间的差异metric
可能会超过十倍。
在阅读(1)时,我删除了所有可以在评估中以不同方式工作的正则化层。从模型。不dropout
,不batchnorm
。有pooling
,但这不应该是它的原因。
装修代码不起眼:
model.fit(x=data_x, y=data_y, batch_size=batch_size, epochs=epochs,
verbose=1, validation_split=0.2, shuffle=True, workers=4)
这是网络的代码:
class CustomUnet(object):
def __init__(self, image_shape=(20, 30, 3), n_class=2, **params):
# read parameters
initial_filters = params.get("initial_filters", 64)
conv_activations = params.get("conv_activations", ReLU())
final_activation = params.get("final_activation", "softmax")
self.name = "CustomUnet"
input_layer = Input(shape=image_shape, name='image_input')
conv1 = self.conv_block(input_layer, nfilters=initial_filters, activation=conv_activations, name="con1")
conv1_out = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = self.conv_block(conv1_out, nfilters=initial_filters*2, activation=conv_activations, name="con2")
conv2_out = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = self.conv_block(conv2_out, nfilters=initial_filters*4, activation=conv_activations, name="con3")
conv3_out = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = self.conv_block(conv3_out, nfilters=initial_filters*8, activation=conv_activations, name="con4")
# number jumps from 4 to 7 because it used to have an extra layer and haven't got to refactor properly.
deconv7 = self.deconv_block(conv4, residual=conv3, nfilters=initial_filters*4, name="decon7",
conv_activations=conv_activations)
deconv8 = self.deconv_block(deconv7, residual=conv2, nfilters=initial_filters*2, name="decon8",
conv_activations=conv_activations)
deconv9 = self.deconv_block(deconv8, residual=conv1, nfilters=initial_filters, name="decon9",
conv_activations=conv_activations)
output_layer = Conv2D(filters=n_class, kernel_size=(1, 1))(deconv9)
model = Model(inputs=input_layer, outputs=output_layer4, name='Unet')
self.model = model
def conv_block(self, input_layer, nfilters, size=3, padding='same', initializer="he_normal", name="none",
activation=ReLU()):
x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(input_layer)
x = Activation(activation)(x)
x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(x)
x = Activation(activation)(x)
return x
def deconv_block(self, input_layer, residual, nfilters, size=3, padding='same', strides=(2, 2), name="none",
conv_activations=ReLU()):
y = Conv2DTranspose(nfilters, kernel_size=(size, size), strides=strides, padding=padding)(input_layer)
y = concatenate([y, residual]) #, axis=3)
y = self.conv_block(y, nfilters, activation=conv_activations)
return y
这是预期的行为吗?我不了解如何计算 theloss
和 the的区别metric
是什么?我在代码中搞砸了什么吗?
谢谢!!
最小的可重现示例:
from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model
import numpy as np
input_data = np.random.rand(100, 300, 300, 3) # 300x300 images
out_data = np.random.randint(0, 2, size=(100, 300, 300, 4)) # 4 classes
def simple_model(image_shape, n_class):
input_layer = Input(shape=image_shape, name='image_input')
x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(input_layer)
x = Activation("relu")(x)
x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(x)
x = Activation("relu")(x)
x = Conv2D(filters=n_class, kernel_size=(1, 1))(x)
output_layer = Activation("softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer, name='Sample')
return model
sample_model = simple_model(input_data[0].shape, out_data.shape[-1])
sample_model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["categorical_crossentropy"])
batch_size = 5
steps = input_data.shape[0] // batch_size
epochs = 20
history = sample_model.fit(x=input_data, y=out_data, batch_size=batch_size, epochs=epochs, # , callbacks=callbacks,
verbose=1, validation_split=0.2, workers=1)
我得到的结果仍然很奇怪:
80/80 [===] - 9s 108ms/step - loss: 14.0259 - categorical_crossentropy: 2.8051 - val_loss: 13.9439 - val_categorical_crossentropy: 2.7885
所以loss: 14.0259 - categorical_crossentropy: 2.8051
。现在我迷路了...
解决方案
有一个有效的解决方案。
这似乎是 TF 导入库的问题。
如果我做
from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model
我从上面得到了奇怪的行为
如果我将其替换为
from keras.layers import Input, Conv2D, Activation
from keras.models import Model
我得到了更加一致的数字:
5/80 [>.....] - ETA: 20s - loss: 2.7886 - categorical_crossentropy: 2.7879
10/80 [==>...] - ETA: 12s - loss: 2.7904 - categorical_crossentropy: 2.7899
15/80 [====>.] - ETA: 9s - loss: 2.7900 - categorical_crossentropy: 2.7896
仍然存在一些差异,但它们似乎更合理不过,如果您知道原因,请告诉我!
推荐阅读
- python - 使用来自客户表(mysql)的凭据的 Django 登录身份验证不起作用
- python - 使用 keras 模型预测单个记录的结果
- traefik - 本机兼容的软件是什么意思?
- javascript - 反应备忘录不渲染组件
- angular - 我可以在不同页面上为同一个 Angular 模块设置不同的引导入口点吗?
- swift - Alamofire 请求调用:参数、标题和正文不起作用
- python - 如何使用不断更新的 CSV 文件中的随机值实时绘制圆圈?
- mysql - Spring Boot / MySQL 和 Docker 问题
- c# - 向视图显示 ModelState 错误
- java - Android 上 DriverManager.getConnection() 的 UnsupportedOperationException