python - CUDA_ERROR_OUT_OF_MEMORY:内存不足:对于 tensorflow 2.1
问题描述
我是 tensorflow-gpu 的新手,在 CPU 上运行似乎很好,但不知何故无法让 GPU 版本工作。请让我知道接下来我该怎么做。非常感谢!
我将 Python 3.7.7 与 TensorFlow 2.1 一起使用,并使用
conda install tensorflow-gpu
系统规格:
Intel(R) core(TM) I5-7440HQ CPU @ 2.80 GHZ
RAM: 8GB
显卡规格:
Model: GeForce 930MX
GPU memory: 5.9 GB
Dedicated GPU memory: 2GB
Shared GPU memory: 3.9 GB
英伟达-SMI
NVIDIA-SMI 445.87 Driver Version: 445.87 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 930MX WDDM | 00000000:02:00.0 Off | N/A |
| N/A 49C P8 N/A / N/A | 37MiB / 2048MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU PID Type Process name GPU Memory |
| Usage |
运行批量大小为 32 的简单 MNIST 数据集训练。
Jupyter 笔记本命令提示符:
2020-04-23 12:43:12.448744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-23 12:43:18.625257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-23 12:43:18.863674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce 930MX computeCapability: 5.0
coreClock: 1.0195GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 13.41GiB/s
2020-04-23 12:43:18.869948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-23 12:43:18.943177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-23 12:43:19.004099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-23 12:43:19.030424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-23 12:43:19.092306: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-23 12:43:19.139074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-23 12:43:19.264762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-23 12:43:19.436399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-23 12:43:19.443444: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-23 12:43:19.455503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce 930MX computeCapability: 5.0
coreClock: 1.0195GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 13.41GiB/s
2020-04-23 12:43:19.463043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-23 12:43:19.467340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-23 12:43:19.470920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-23 12:43:19.477116: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-23 12:43:19.486208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-23 12:43:19.494696: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-23 12:43:19.505751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-23 12:43:19.515014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-23 12:43:24.165525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-23 12:43:24.169026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-04-23 12:43:24.171068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-04-23 12:43:24.175336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1377 MB memory) -> physical GPU (device: 0, name: GeForce 930MX, pci bus id: 0000:02:00.0, compute capability: 5.0)
2020-04-23 12:43:24.219369: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 1.34G (1444337920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.237697: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 1.21G (1299904256 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.260040: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 1.09G (1169913856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.284695: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 1004.14M (1052922624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.306355: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 903.73M (947630336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.327752: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 813.36M (852867328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.357554: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 732.02M (767580672 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.384318: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 658.82M (690822656 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.406377: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 592.94M (621740544 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-23 12:43:24.426737: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 533.64M (559566592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[I 12:43:26.566 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
以下是我正在尝试训练的模型。在 CPU 上工作正常。
model = Sequential()
model.add(Conv2D(256, (3, 3), input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=3, validation_split=0.3)
解决方案
Cuda11.0 与 Tensorflow2.1 不兼容。请在此处检查兼容性。
版本 Python 版本 编译器 构建工具 cuDNN CUDA。
张量流-2.1.0 2.7、3.5-3.7 GCC 7.3.1 巴泽尔 0.27.1 7.6 10.1
Tensorflow 2.1 与 Cuda10.1 兼容。所以你有两个选择
选项1
创建conda环境并安装tensorflow-gpu==2.1
conda create -n tf_gpu
source activete tf_gpu
Within the virtual environment
conda install tensorflow-gpu=2.1
有时以下工作
conda create --name <some_name> tensorflow-gpu=2.1.0 cudatoolkit=10.1 python=3.6
选项 2
卸载 Tensorflow 和 Cuda11.0,关闭并重新启动计算机,然后使用上述命令(用于安装基于 conda)重新安装 tensorflow-gpu 或按照此处的说明使用 pip 安装。
推荐阅读
- hibernate - 同义词可以代替触发器吗
- python - tf.audio.decode_wav 如何获取其内容?
- reactjs - 运行 firebase deploy 时找不到 404 页面
- sql - 如果返回超过 1 个 MS SQL,则删除
- python - 如何使用 keras imagedatagenerator 为 unet 加载图像和掩码
- sql - 查询关系表 - SQL
- docker - Docker 在不移除容器的情况下重建 .Net Core 项目
- typescript - Tsconfig,当我使用多个 rootDirs 时,如何在构建到 dist 时将结构 src 文件展平
- kubernetes - 无法在 Kubernetes 集群上使用 helm chart 部署服务
- azure-data-factory - 创作和部署