tensorflow - 在 Tensorflow 2.3.1 上无法使用 GeForce GTX 1070
问题描述
系统特点:
Ubuntu 20.04
NVIDIA Corporation GP104BM [GeForce GTX 1070 Mobile]
Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8
RAM: 31,3 GiB
Cuda Driver Version: 450.80.02
tensorflow-2.3.1
CUDA Version: 11.0
cudnn-11.0
用户@Linux:~$ nvidia-smi
Sat Nov 21 15:52:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| N/A 50C P8 9W / N/A | 297MiB / 8111MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 932 G /usr/lib/xorg/Xorg 92MiB |
| 0 N/A N/A 1295 G /usr/bin/gnome-shell 111MiB |
| 0 N/A N/A 3277 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 3828 C /usr/bin/python3 87MiB |
+-----------------------------------------------------------------------------+
用户@Linux:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
从 tensorflow.python.client 导入 device_lib 打印(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11676041203334616666
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 9716226955421748203
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1948208650029266498
physical_device_desc: "device: XLA_GPU device"
]
当我运行代码时,我得到一个错误:
with tf.device('/device:XLA_GPU:0'):
history=model.fit(train_generator,
epochs=10,
validation_data=valid_generator,
validation_steps=test.shape[0]//batch_size,
steps_per_epoch=train.shape[0]//batch_size
)
Epoch 1/10
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-15-cc03ad3613b3> in <module>
1 with tf.device('/device:XLA_GPU:0'):
----> 2 history=model.fit(train_generator,
3 epochs=10,
4 validation_data=valid_generator,
5 validation_steps=test.shape[0]//batch_size,
...
Epoch 1/10
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-15-cc03ad3613b3> in <module>
1 with tf.device('/device:XLA_GPU:0'):
----> 2 history=model.fit(train_generator,
3 epochs=10,
4 validation_data=valid_generator,
5 validation_steps=test.shape[0]//batch_size,
没有代码:使用 tf.device('/device:XLA_GPU:0') 一切正常,但是很长一段时间(没有 GPU 工作......)
解决方案
将 XLA_GPA:0 替换为 GPU:0
使用下面的代码片段来避免此错误。
with tf.device('/device:GPU:0'):
history=model.fit(train_generator,
epochs=10,
validation_data=valid_generator,
validation_steps=test.shape[0]//batch_size,
steps_per_epoch=train.shape[0]//batch_size
)