首页 > 解决方案 > 在谷歌 Colab 中训练 CNN 模型时卡在第一个 Epoch

问题描述

我创建了一个模型来识别植物疾病。我希望能识别出 10 种疾病。在 jupyter notebook 中,它运行良好,但由于 GPU 限制,它运行缓慢。然后我决定在 google colab 中运行该模型,但它没有运行。它停留在第一个时代。

我用来构建模型的代码如下

BATCH_SIZE = 64
IMAGE_SIZE = 256
CHANNELS=3
EPOCHS=10

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/drive/MyDrive/google-colab-files/PlantVillage",
    seed=123,
    shuffle=True,
    image_size=(IMAGE_SIZE,IMAGE_SIZE),
    batch_size=BATCH_SIZE
)
def get_dataset_partisions_tf(ds,trains_split=0.8,val_split=0.1,test_split=0.1,shuffle=True,shuffle_size=10000):
    
    ds_size = len(ds)
    
    if shuffle:
        ds = ds.shuffle(shuffle_size,seed=12)
    
    train_size = int(trains_split * ds_size)
    val_size = int(val_split * ds_size)
    
    train_ds = ds.take(train_size)
    
    val_ds = ds.skip(train_size).take(val_size)
    test_ds = ds.skip(train_size).skip(val_size)
    
    return train_ds,val_ds,test_ds

 train_ds,val_ds,test_ds = get_dataset_partisions_tf(dataset)

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size = tf.data.AUTOTUNE)
val_ds = val_ds.cache().shuffle(1000).prefetch(buffer_size = tf.data.AUTOTUNE)
test_ds = test_ds.cache().shuffle(1000).prefetch(buffer_size = tf.data.AUTOTUNE)

resize_and_rescales = Sequential([
    layers.experimental.preprocessing.Resizing(IMAGE_SIZE,IMAGE_SIZE),
    layers.experimental.preprocessing.Rescaling(1.0/255)
])
data_agmetation = Sequential([
    layers.experimental.preprocessing.RandomFlip('horizontal_and_vertical'),
    layers.experimental.preprocessing.RandomRotation(0.2),
])
input_shape = (BATCH_SIZE,IMAGE_SIZE,IMAGE_SIZE,CHANNELS)
n_classes = 10

model = Sequential([
    resize_and_rescales,
    data_agmetation,
    
    layers.Conv2D(32,(3,3), activation='relu',input_shape = input_shape),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64,kernel_size = (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64,kernel_size = (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64,(3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64,(3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64,(3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    
    layers.Flatten(),
    layers.Dense(64,activation='relu'),
    layers.Dense(n_classes, activation='softmax'),
])

model.build(input_shape = input_shape)
model.summary()

模型摘要的屏幕截图是:

在此处输入图像描述

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=['accuracy']
)

当我使用以下代码训练数据时:

model.fit(
    train_ds,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    verbose=2,
    validation_data=val_ds
)

它一直停留在第一个时代

在此处输入图像描述

标签: tensorflowmachine-learningdeep-learningconv-neural-networkgoogle-colaboratory

解决方案


检查 TensorFlow 是否使用 GPU。您可以尝试减少批量大小。


推荐阅读