tensorflow - 即使安装了 tensorflow GPU，Keras 深度学习也无法在 GPU 上运行

问题描述

我想使用 keras tensorflow GPU 后端训练 CNN 模型进行图像分类。我已经检查并且 tensorflow 能够检测到 GPU。但是 keras 并没有使用 GPU 来训练模型。任务管理器还指示在训练模型时 CPU 利用率为 100%，GPU 为 0%。

我已经安装

视觉工作室社区 2017
Python 3.7.3
CUDA 10.0
库顿 7.6
蟒蛇

我正在使用 Windows 10 64 位、GPU 1050 GTX 4gb、CPU intel i5 第 7 代。

要安装 tensorflow GPU，我使用了以下命令

conda create --name tf_gpu tensorflow-gpu

我还尝试了以下 3 种方法来强制 GPU 进行训练

with tensorflow.device('/gpu:0'):
    #code

from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0
     #code

from keras import backend as K
K.tensorflow_backend._get_available_gpus()
     #code

我在虚拟环境中安装的包

# packages in environment at C:\Users\Sreenivasa Reddy\Anaconda3\envs\tf_gpu:
#
# Name                    Version                   Build  Channel
_tflow_select             2.1.0                       gpu
absl-py                   0.7.1                    py37_0
alabaster                 0.7.12                   py37_0
asn1crypto                0.24.0                   py37_0
astor                     0.7.1                    py37_0
astroid                   2.2.5                    py37_0
attrs                     19.1.0                   py37_1
babel                     2.7.0                      py_0
backcall                  0.1.0                    py37_0
blas                      1.0                         mkl
bleach                    3.1.0                    py37_0
ca-certificates           2019.5.15                     0
certifi                   2019.6.16                py37_0
cffi                      1.12.3           py37h7a1dbc1_0
chardet                   3.0.4                    py37_1
cloudpickle               1.2.1                      py_0
colorama                  0.4.1                    py37_0
cryptography              2.7              py37h7a1dbc1_0
cudatoolkit               10.0.130                      0
cudnn                     7.6.0                cuda10.0_0
decorator                 4.4.0                    py37_1
defusedxml                0.6.0                      py_0
docutils                  0.14                     py37_0
entrypoints               0.3                      py37_0
freetype                  2.9.1                ha9979f8_1
gast                      0.2.2                    py37_0
grpcio                    1.16.1           py37h351948d_1
h5py                      2.9.0            py37h5e291fa_0
hdf5                      1.10.4               h7ebc959_0
icc_rt                    2019.0.0             h0cc432a_1
icu                       58.2                 ha66f8fd_1
idna                      2.8                      py37_0
imagesize                 1.1.0                    py37_0
intel-openmp              2019.4                      245
ipykernel                 5.1.1            py37h39e3cac_0
ipython                   7.6.1            py37h39e3cac_0
ipython_genutils          0.2.0                    py37_0
isort                     4.3.21                   py37_0
jedi                      0.13.3                   py37_0
jinja2                    2.10.1                   py37_0
jpeg                      9b                   hb83a4c4_2
jsonschema                3.0.1                    py37_0
jupyter_client            5.3.1                      py_0
jupyter_core              4.5.0                      py_0
Keras                     2.2.4                     <pip>
keras-applications        1.0.8                      py_0
keras-preprocessing       1.1.0                      py_1
keyring                   18.0.0                   py37_0
lazy-object-proxy         1.4.1            py37he774522_0
libpng                    1.6.37               h2a8f88b_0
libprotobuf               3.8.0                h7bd577a_0
libsodium                 1.0.16               h9d3ae62_0
libtiff                   4.0.10               hb898794_2
markdown                  3.1.1                    py37_0
markupsafe                1.1.1            py37he774522_0
mccabe                    0.6.1                    py37_1
mistune                   0.8.4            py37he774522_0
mkl                       2019.4                      245
mkl_fft                   1.0.12           py37h14836fe_0
mkl_random                1.0.2            py37h343c172_0
mock                      3.0.5                    py37_0
nbconvert                 5.5.0                      py_0
nbformat                  4.4.0                    py37_0
numpy                     1.16.4           py37h19fb1c0_0
numpy-base                1.16.4           py37hc3f5095_0
numpydoc                  0.9.1                      py_0
olefile                   0.46                     py37_0
openssl                   1.1.1c               he774522_1
packaging                 19.0                     py37_0
pandoc                    2.2.3.2                       0
pandocfilters             1.4.2                    py37_1
parso                     0.5.0                      py_0
pickleshare               0.7.5                    py37_0
pillow                    6.1.0            py37hdc69c19_0
pip                       19.1.1                   py37_0
prompt_toolkit            2.0.9                    py37_0
protobuf                  3.8.0            py37h33f27b4_0
psutil                    5.6.3            py37he774522_0
pycodestyle               2.5.0                    py37_0
pycparser                 2.19                     py37_0
pyflakes                  2.1.1                    py37_0
pygments                  2.4.2                      py_0
pylint                    2.3.1                    py37_0
pyopenssl                 19.0.0                   py37_0
pyparsing                 2.4.0                      py_0
pyqt                      5.9.2            py37h6538335_2
pyreadline                2.1                      py37_1
pyrsistent                0.14.11          py37he774522_0
pysocks                   1.7.0                    py37_0
python                    3.7.3                h8c8aaf0_1
python-dateutil           2.8.0                    py37_0
pytz                      2019.1                     py_0
pywin32                   223              py37hfa6e2cd_1
PyYAML                    5.1.1                     <pip>
pyzmq                     18.0.0           py37ha925a31_0
qt                        5.9.7            vc14h73c81de_0
qtawesome                 0.5.7                    py37_1
qtconsole                 4.5.1                      py_0
qtpy                      1.8.0                      py_0
requests                  2.22.0                   py37_0
rope                      0.14.0                     py_0
scipy                     1.2.1            py37h29ff71c_0
setuptools                41.0.1                   py37_0
sip                       4.19.8           py37h6538335_0
six                       1.12.0                   py37_0
snowballstemmer           1.9.0                      py_0
sphinx                    2.1.2                      py_0
sphinxcontrib-applehelp   1.0.1                      py_0
sphinxcontrib-devhelp     1.0.1                      py_0
sphinxcontrib-htmlhelp    1.0.2                      py_0
sphinxcontrib-jsmath      1.0.1                      py_0
sphinxcontrib-qthelp      1.0.2                      py_0
sphinxcontrib-serializinghtml 1.1.3                      py_0
spyder                    3.3.6                    py37_0
spyder-kernels            0.5.1                    py37_0
sqlite                    3.29.0               he774522_0
tensorboard               1.13.1           py37h33f27b4_0
tensorflow                1.13.1          gpu_py37h83e5d6a_0
tensorflow-base           1.13.1          gpu_py37h871c8ca_0
tensorflow-estimator      1.13.0                     py_0
tensorflow-gpu            1.13.1               h0d30ee6_0
termcolor                 1.1.0                    py37_1
testpath                  0.4.2                    py37_0
tk                        8.6.8                hfa6e2cd_0
tornado                   6.0.3            py37he774522_0
traitlets                 4.3.2                    py37_0
urllib3                   1.24.2                   py37_0
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.15.26706          h3a45250_4
wcwidth                   0.1.7                    py37_0
webencodings              0.5.1                    py37_1
werkzeug                  0.15.4                     py_0
wheel                     0.33.4                   py37_0
win_inet_pton             1.1.0                    py37_0
wincertstore              0.2                      py37_0
wrapt                     1.11.2           py37he774522_0
xz                        5.2.4                h2fa13f4_4
zeromq                    4.3.1                h33f27b4_3
zlib                      1.2.11               h62dcd97_3
zstd                      1.3.7                h508b16e_0

检查 tensorflow 是否检测到 GPU

Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2019-07-22 17:05:26.706907: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-22 17:05:26.916585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-07-22 17:05:26.923097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-22 17:05:27.594264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-22 17:05:27.598321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-07-22 17:05:27.600418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-07-22 17:05:27.602687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 3011 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17686286348873888351
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3157432729
locality {
  bus_id: 1
  links {
  }
}
incarnation: 5873520528294819841
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

我的keras代码

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
classifier=Sequential()
    classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Convolution2D(32,3,3,activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Convolution2D(64,3,3,activation='relu'))
    classifier.add(MaxPool2D(pool_size=(2,2)))
    classifier.add(Flatten())
    classifier.add(Dense(output_dim=128, activation='relu'))
    classifier.add(Dense(output_dim=1, activation='sigmoid'))
    classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

    from keras.preprocessing.image import ImageDataGenerator
    train_datagen = ImageDataGenerator(
            rescale=1./255,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True)

    test_datagen = ImageDataGenerator(rescale=1./255)

    training_set = train_datagen.flow_from_directory(
            'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
            target_size=(32, 32),
            batch_size=32,
            class_mode='binary')

    test_set = test_datagen.flow_from_directory(
            'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
            target_size=(32, 32),
            batch_size=32,
            class_mode='binary')

    classifier.fit_generator(
            training_set,
            steps_per_epoch=8000,
            epochs=25,
            validation_data=test_set,
            validation_steps=2000)

iPython 控制台中的输出

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense

from keras import backend as K
K.tensorflow_backend._get_available_gpus()
Out[15]: ['/job:localhost/replica:0/task:0/device:GPU:0']

classifier=Sequential()
classifier.add(Convolution2D(32,3,3,input_shape=(32,32,3),activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Convolution2D(64,3,3,activation='relu'))
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Flatten())
classifier.add(Dense(output_dim=128, activation='relu'))
classifier.add(Dense(output_dim=1, activation='sigmoid'))
classifier.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/training_set',
        target_size=(32, 32),
        batch_size=32,
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
        'C:/Users/Sreenivasa Reddy/Desktop/Deep_Learning_A_Z/Volume_1_Supervised_Deep_Learning/Part2_Convolutional_Neural_Networks/Convolutional_Neural_Networks/dataset/test_set',
        target_size=(32, 32),
        batch_size=32,
        class_mode='binary')

classifier.fit_generator(
        training_set,
        steps_per_epoch=8000,
        epochs=25,
        validation_data=test_set,
        validation_steps=2000)
__main__:2: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), input_shape=(32, 32, 3..., activation="relu")`
__main__:4: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(32, (3, 3), activation="relu")`
__main__:6: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(64, (3, 3), activation="relu")`
__main__:9: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", units=128)`
__main__:10: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", units=1)`
Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/25
 782/8000 [=>............................] - ETA: 17:38 - loss: 0.6328 - acc: 0.6310

注意：我在运行一段时间后停止了内核以从 iPython 控制台复制代码片段

编辑：我训练了 RNN 和 ANN 模型，当我在训练时检查任务管理器时，CUDA 利用率约为 35%，但 CNN 模型的 CUDA 利用率为 2%。CUDA 的 35% 直到化率不是很低吗？为什么 CNN 不利用 35%

EDIT2：奇怪的是，当我增加批量大小时，模型训练非常慢，当我减小批量大小（即当我将其变为 1 时）模型训练速度更快，对此有什么解释吗？

标签： tensorflowkerasdeep-learningspyder

解决方案

我在这里问我的问题是因为我还没有获得评论的特权：/

您提到您尝试了不同的方法：

“使用 tensorflow.device('/gpu:0'): #code ...

在您发布的代码中，我看不到它们或使用 gpu 的不同方法，但我认为您使用了一种方法来获得上面的输出？

如果你使用这些方法会发生什么？它仍然只使用 gpu 还是出现错误？

你能不能尝试这样的事情并发布结果：

# Creates a graph.

with tf.device('/gpu:0'):

    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

    c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Runs the op.

print(sess.run(c))

就像在这个例子中提到的：https ://dzone.com/articles/how-to-train-tensorflow-models-using-gpus

编辑至于您的使用问题。

这可能有多种原因，例如：

您可以尝试增加批量大小，这相当小。在许多示例中，这会导致 GPU 空闲，因为它正在等待从 CPU 获取数据（这也可以解释您的 100% CPU 使用率）。此外，您的训练集的样本量非常小（只有 8000 个）。如果您只是想增加 GPU 使用率，您可以将批量大小设置为 512 甚至 1024，并人为地增加您的样本大小（例如复制您的样本倍数）。但请注意，这不会给你一个好的模型，这只是为了增加 GPU 的使用！
您的网络非常小，因此您不会从 GPU 加速中获得太多收益。您可以尝试增加网络的大小以测试 GPU 使用率是否增加。

这在 Tensorflow 训练期间非常低的 GPU 使用率中也提到过

我希望这有帮助。

tensorflow - 即使安装了 tensorflow GPU，Keras 深度学习也无法在 GPU 上运行

问题描述

解决方案

推荐阅读