tensorflow - Windows 10 中带有 GPU Quadro M1200 的 TensorFlow
问题描述
我能够为我的普通 Quadro M1200 安装 NVIDIA 驱动程序并运行 TensorFlow 2.3.0。
使用 GPU 测试 TensorFlow 的工作原理:
In [2]: tf.config.list_physical_devices('GPU')
Tt 返回Out[2]: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
,还有一些细节如下:
- 成功打开动态库nvcuda.dll
- 找到具有属性的设备 0:pciBusID:0000:01:00.0 名称:Quadro M1200 计算能力:5.0
- 成功打开动态库cudart64_101.dll
- 成功打开动态库cublas64_10.dll
- 成功打开动态库 cufft64_10.dll
- 成功打开动态库curand64_10.dll
- 成功打开动态库cusolver64_10.dll
- 成功打开动态库cusparse64_10.dll
- 成功打开动态库cudnn64_7.dll
- 添加可见 gpu 设备:0
但是,当我创建模型时,TensorFlow 崩溃了。这是代码:
In [4]: model = keras.Sequential([^M
...: keras.layers.Flatten(input_shape=(28, 28)),^M
...: keras.layers.Dense(128, activation='relu'),^M
...: keras.layers.Dense(10)^M
...: ])
2020-09-26 14:48:36.858359: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-26 14:48:36.936376: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25353e19ea0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-26 14:48:36.945965: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-26 14:48:36.956140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro M1200 computeCapability: 5.0
coreClock: 1.148GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-09-26 14:48:36.971039: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-26 14:48:36.977439: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-26 14:48:36.984968: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-26 14:48:36.991871: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-26 14:48:37.000114: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-26 14:48:37.007128: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-26 14:48:37.018878: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-26 14:48:37.025486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-26 14:48:37.602596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-26 14:48:37.610835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-26 14:48:37.617839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-26 14:48:37.625390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3119 MB memory) -> physical GPU (device: 0, name: Quadro M1200, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-09-26 14:48:37.649822: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2535430b520 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-26 14:48:37.661985: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Quadro M1200, Compute Capability 5.0
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-4-0eeba0e16040> in <module>
----> 1 model = keras.Sequential([
2 keras.layers.Flatten(input_shape=(28, 28)),
3 keras.layers.Dense(128, activation='relu'),
4 keras.layers.Dense(10)
5 ])
~\Miniconda3\lib\site-packages\tensorflow\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
~\Miniconda3\lib\site-packages\tensorflow\python\keras\engine\sequential.py in __init__(self, layers, name)
114 """
115 # Skip the init in FunctionalModel since model doesn't have input/output yet
--> 116 super(functional.Functional, self).__init__( # pylint: disable=bad-super-call
117 name=name, autocast=False)
118 self.supports_masking = True
~\Miniconda3\lib\site-packages\tensorflow\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
~\Miniconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in __init__(self, *args, **kwargs)
306 self._steps_per_execution = None
307
--> 308 self._init_batch_counters()
309 self._base_model_initialized = True
310 _keras_api_gauge.get_cell('model').set(True)
~\Miniconda3\lib\site-packages\tensorflow\python\training\tracking\base.py in _method_wrapper(self, *args, **kwargs)
455 self._self_setattr_tracking = False # pylint: disable=protected-access
456 try:
--> 457 result = method(self, *args, **kwargs)
458 finally:
459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access
~\Miniconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in _init_batch_counters(self)
315 # `evaluate`, and `predict`.
316 agg = variables.VariableAggregationV2.ONLY_FIRST_REPLICA
--> 317 self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
318 self._test_counter = variables.Variable(0, dtype='int64', aggregation=agg)
319 self._predict_counter = variables.Variable(
~\Miniconda3\lib\site-packages\tensorflow\python\ops\variables.py in __call__(cls, *args, **kwargs)
260 return cls._variable_v1_call(*args, **kwargs)
261 elif cls is Variable:
--> 262 return cls._variable_v2_call(*args, **kwargs)
263 else:
264 return super(VariableMetaclass, cls).__call__(*args, **kwargs)
~\Miniconda3\lib\site-packages\tensorflow\python\ops\variables.py in _variable_v2_call(cls, initial_value, trainable, validate_shape, caching_device, name, variable_def, dtype, import_scope, constraint, synchronization, aggregation, shape)
242 if aggregation is None:
243 aggregation = VariableAggregation.NONE
--> 244 return previous_getter(
245 initial_value=initial_value,
246 trainable=trainable,
~\Miniconda3\lib\site-packages\tensorflow\python\ops\variables.py in <lambda>(**kws)
235 shape=None):
236 """Call on Variable class. Useful to force the signature."""
--> 237 previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
238 for _, getter in ops.get_default_graph()._variable_creator_stack: # pylint: disable=protected-access
239 previous_getter = _make_getter(getter, previous_getter)
~\Miniconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py in default_variable_creator_v2(next_creator, **kwargs)
2631 shape = kwargs.get("shape", None)
2632
-> 2633 return resource_variable_ops.ResourceVariable(
2634 initial_value=initial_value,
2635 trainable=trainable,
~\Miniconda3\lib\site-packages\tensorflow\python\ops\variables.py in __call__(cls, *args, **kwargs)
262 return cls._variable_v2_call(*args, **kwargs)
263 else:
--> 264 return super(VariableMetaclass, cls).__call__(*args, **kwargs)
265
266
~\Miniconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation, shape)
1505 self._init_from_proto(variable_def, import_scope=import_scope)
1506 else:
-> 1507 self._init_from_args(
1508 initial_value=initial_value,
1509 trainable=trainable,
~\Miniconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation, distribute_strategy, shape)
1659 else:
1660 shape = initial_value.shape
-> 1661 handle = eager_safe_variable_handle(
1662 initial_value=initial_value,
1663 shape=shape,
~\Miniconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in eager_safe_variable_handle(initial_value, shape, shared_name, name, graph_mode)
240 """
241 dtype = initial_value.dtype.base_dtype
--> 242 return _variable_handle_from_shape_and_dtype(
243 shape, dtype, shared_name, name, graph_mode, initial_value)
244
~\Miniconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in _variable_handle_from_shape_and_dtype(shape, dtype, shared_name, name, graph_mode, initial_value)
172 # compatible with ASYNC execution mode. Further, since not all devices
173 # support string tensors, we encode the assertion string in the Op name
--> 174 gen_logging_ops._assert( # pylint: disable=protected-access
175 math_ops.logical_not(exists), [exists], name="EagerVariableNameReuse")
176
~\Miniconda3\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py in _assert(condition, data, summarize, name)
47 return _result
48 except _core._NotOkStatusException as e:
---> 49 _ops.raise_from_not_ok_status(e, name)
50 except _core._FallbackException:
51 pass
~\Miniconda3\lib\site-packages\tensorflow\python\framework\ops.py in raise_from_not_ok_status(e, name)
6841 message = e.message + (" name: " + name if name is not None else "")
6842 # pylint: disable=protected-access
-> 6843 six.raise_from(core._status_to_exception(e.code, message), None)
6844 # pylint: enable=protected-access
6845
~\Miniconda3\lib\site-packages\six.py in raise_from(value, from_value)
InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
Unhandled exception in event loop:
File "C:\Users\AMartinez\Miniconda3\lib\asyncio\proactor_events.py", line 768, in _loop_self_reading
f.result() # may raise
File "C:\Users\AMartinez\Miniconda3\lib\asyncio\windows_events.py", line 808, in _poll
value = callback(transferred, key, ov)
File "C:\Users\AMartinez\Miniconda3\lib\asyncio\windows_events.py", line 457, in finish_recv
raise ConnectionResetError(*exc.args)
Exception [WinError 995] The I/O operation has been aborted because of either a thread exit or an application request
Press ENTER to continue...
In [5]:
我已经尝试了很多 cudnn bot 的颠覆,但都没有解决问题。
cudnn-10.1-windows10-x64-v7.6.1.34.zip
cudnn-10.1-windows10-x64-v7.6.2.24.zip
cudnn-10.1-windows10-x64-v7.6.3.30.zip
请问有什么办法解决这个问题吗?
解决方案
错误在于 CUDA 驱动程序和 TF 版本 2.3 不兼容。我将 TF 升级到 2.4,和 nvidia 驱动器,它可以工作。我还设法使它与以前的 TF 2 版本一起工作。
推荐阅读
- javascript - 如何从 React 中的今天日期中减去日期?
- google-chrome - WebSerial API中的方法GetInfo?还有另一种获取设备信息的方法吗?
- python - 理解python列表理解
- java - 如何将 Microsoft SQL Server 2014 Management Studio 连接到 Java 程序
- python - Python 将日期类型转换为 %Y-%m-%d
- apk - 如何将文件(.jks,.p12)作为变量?
- sql - 如何在 Hive 中每过去 3 个月找到第一个值
- php - MySQL,PHP - WHERE 子句
- docker - Docker 中的 Apache NiFi 集群超过 3 个虚拟机
- reactjs - 如何在 Formik 输入中覆盖 onPaste 并更改粘贴的文本?