python - 为什么我使用 Tenserflow 和 Keras GPU 的模型出现 OOM 错误?
问题描述
我正在尝试运行我的模型,但我正在运行一个错误
2021-06-03 01:20:42.015864: W tensorflow/core/common_runtime/bfc_allocator.cc:467] **************************************************************************__________________________
2021-06-03 01:20:42.015984: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[8938,46080] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8938,46080] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:ConcatV2] name: concat
我的代码:
import numpy as np
import tensorflow as tf
from cv2 import cv2
from keras.applications.densenet import preprocess_input
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import MaxPool2D, MaxPool3D, GlobalAveragePooling2D, Reshape, GlobalMaxPooling2D, MaxPooling2D, Flatten, AveragePooling2D
# physical_devices = tf.config.experimental.list_physical_devices('GPU')
# print("Num GPU Available", len(physical_devices))
# tf.config.experimental.set_memory_growth(physical_devices[0], True)
train_path = 'data/train'
test_path = 'data/test'
batch_size = 16
image_size = (360, 360)
train_batches = ImageDataGenerator(
preprocessing_function=preprocess_input,
# rescale=1./255,
horizontal_flip=True,
rotation_range=.3,
width_shift_range=.2,
height_shift_range=.2,
zoom_range=.2
).flow_from_directory(directory=train_path,
target_size=image_size,
color_mode='rgb',
batch_size=batch_size,
shuffle=True)
test_batches = ImageDataGenerator(
preprocessing_function=preprocess_input
# rescale=1./255
).flow_from_directory(directory=test_path,
target_size=image_size,
color_mode='rgb',
batch_size=batch_size,
shuffle=True)
# mobile = tf.keras.applications.mobilenet.MobileNet()
mobile = tf.keras.applications.mobilenet_v2.MobileNetV2(include_top=False, weights='imagenet', input_shape=(360, 360, 3))
x = MaxPool2D()(mobile.layers[-1].output)
x = Flatten()(x)
model = Model(inputs=mobile.input, outputs=x)
train_features = model.predict(train_batches, train_batches.labels)
test_features = model.predict(test_batches, test_batches.labels)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train_features)
test_scaled = scaler.fit_transform(test_features)
from sklearn.svm import SVC
svm = SVC()
svm.fit(train_scaled, train_batches.labels)
print('train accuracy:')
print(svm.score(train_scaled, train_batches.labels))
print('test accuracy:')
print(svm.score(test_scaled, test_batches.labels))
解决方案
此错误是内存不足。尝试降低 的值batch_size
。
推荐阅读
- angular - 从组件函数中检索文件数据
- java - Spring Boot 返回 400 缺少构造函数没有堆栈跟踪
- php - 命名空间如何链接到文件夹?
- java - java soap web服务如何从xml请求中获取列表
- python - 如何在 Django 中复制派生模型实例?
- oracle - 仅以读取权限执行的数据库函数
- java - 将数字数组转换为单词数组
- angular - Angular Observer 在组件上捕获错误
- azure - Azure AKS HPA 无法获得 CPU 利用率
- sql-server - SQL Server 错误 80040e14 在结果集中使用经典 ASP