首页 > 解决方案 > 为什么我使用 Tenserflow 和 Keras GPU 的模型出现 OOM 错误?

问题描述

我正在尝试运行我的模型,但我正在运行一个错误

2021-06-03 01:20:42.015864: W tensorflow/core/common_runtime/bfc_allocator.cc:467] **************************************************************************__________________________
2021-06-03 01:20:42.015984: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[8938,46080] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8938,46080] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:ConcatV2] name: concat

我的代码:

import numpy as np
import tensorflow as tf
from cv2 import cv2
from keras.applications.densenet import preprocess_input
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import MaxPool2D, MaxPool3D, GlobalAveragePooling2D, Reshape, GlobalMaxPooling2D, MaxPooling2D, Flatten, AveragePooling2D

# physical_devices = tf.config.experimental.list_physical_devices('GPU')
# print("Num GPU Available", len(physical_devices))
# tf.config.experimental.set_memory_growth(physical_devices[0], True)

train_path = 'data/train'
test_path = 'data/test'
batch_size = 16
image_size = (360, 360)

train_batches = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    # rescale=1./255,
    horizontal_flip=True,
    rotation_range=.3,
    width_shift_range=.2,
    height_shift_range=.2,
    zoom_range=.2
).flow_from_directory(directory=train_path,
                      target_size=image_size,
                      color_mode='rgb',
                      batch_size=batch_size,
                      shuffle=True)

test_batches = ImageDataGenerator(
    preprocessing_function=preprocess_input
    # rescale=1./255
).flow_from_directory(directory=test_path,
                      target_size=image_size,
                      color_mode='rgb',
                      batch_size=batch_size,
                      shuffle=True)

# mobile = tf.keras.applications.mobilenet.MobileNet()
mobile = tf.keras.applications.mobilenet_v2.MobileNetV2(include_top=False, weights='imagenet', input_shape=(360, 360, 3))

x = MaxPool2D()(mobile.layers[-1].output)
x = Flatten()(x)
model = Model(inputs=mobile.input, outputs=x)

train_features = model.predict(train_batches, train_batches.labels)
test_features = model.predict(test_batches, test_batches.labels)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

train_scaled = scaler.fit_transform(train_features)
test_scaled = scaler.fit_transform(test_features)

from sklearn.svm import SVC
svm = SVC()

svm.fit(train_scaled, train_batches.labels)

print('train accuracy:')
print(svm.score(train_scaled, train_batches.labels))
print('test accuracy:')
print(svm.score(test_scaled, test_batches.labels))

标签: pythontensorflowmachine-learningkerasdeep-learning

解决方案


此错误是内存不足。尝试降低 的值batch_size


推荐阅读