首页 > 解决方案 > 训练几批后,Tensorflow 冻结

问题描述

我正在使用 Keras 和 Tensorflow 2.0 作为后端在 python 中创建一个 CNN。当我尝试训练模型时,它开始训练一段时间,但在训练了几批之后它就冻结了。

如果我有6052批次,它可能会冻结,400/6052当这种情况发生时,GPU 使用率会从 12% 下降到 0%。我必须去任务管理器结束这个过程。

如果我使用 CNTK 作为 Keras 的后端,我不会遇到任何问题,这只发生在 Tensorflow 作为后端的情况下,并且它发生在第一个 Epoch 期间。

我如何继续训练我的 CNN 而不会在一段时间后冻结?

这是我的代码示例:

"""
A Convolutional Neural Network class that recognizes handwritten letters. 
"""

import numpy as np

import keras
from keras import backend as K
from keras.models import Sequential, load_model
from keras.layers import Activation, MaxPool2D, Dropout
from keras.layers.core import Dense, Flatten
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import *
from keras.regularizers import l1
from keras.callbacks import CSVLogger, LearningRateScheduler, ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
import itertools
import os, shutil
import cv2  
import random
import pickle

batch = 64
train_datagen = ImageDataGenerator(rotation_range=10,  zoom_range = 0.10,  width_shift_range=0.1, height_shift_range=0.1)
train_generator = train_datagen.flow_from_directory('C:\\Users\\user\\Desktop\\Images for CNN\\train', target_size=(28,28), batch_size=batch, color_mode='grayscale', class_mode='categorical')
test_generator = train_datagen.flow_from_directory('C:\\Users\\user\\Desktop\\Images for CNN\\test', target_size=(28,28), batch_size=batch, color_mode='grayscale', class_mode='categorical', shuffle=False)

model = Sequential()

model.add(Conv2D(64, (3,3), input_shape = (28,28, 1), activation="relu"))
model.add(BatchNormalization())

#more hidden layers

model.add(Dense(52, activation='softmax'))

model.summary()
adam = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.00)
epochs = 2000
annealer = ReduceLROnPlateau(monitor='val_loss', factor=0.99, patience=10)
model.compile(loss="categorical_crossentropy", optimizer=adam, metrics=['accuracy'])
csv_logger = CSVLogger('training.log')
es = EarlyStopping(monitor='val_loss', mode='min', patience=500)
mc = ModelCheckpoint('model.h5', monitor='val_loss', mode='min', save_best_only=True)
model.fit_generator(train_generator, epochs =epochs, validation_data=test_generator, callbacks=[ mc, es, annealer, csv_logger], steps_per_epoch =387360//batch, validation_steps = 411301//batch)

标签: pythonwindowstensorflowkeras

解决方案


推荐阅读