tensorflow - 使用 RTX 2080 ti 的 Ubuntu 18.04 中的分段错误(核心转储)
问题描述
我最近购买了RTX 2080 ti,以便在本地运行一些深度学习项目。我曾多次尝试在 Ubuntu 18.04 中安装 tensorflow-gpu,唯一可行的指南如下:https ://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support- the-Easy-Way-on-Ubuntu-18-04-without-installing-CUDA-1170/#look-at-the-job-run-with-tensorboard
但是,当我开始运行脚本时,会出现以下错误:
Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
2019-01-09 14:49:06.748318: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 14:49:07.730143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-09 14:49:07.732970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.23GiB
2019-01-09 14:49:07.733071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-09 14:49:30.666591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-09 14:49:30.666636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-09 14:49:30.666646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-09 14:49:30.667094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9875 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/15
Segmentation fault (core dumped)
谁能给我一些关于如何使 tensorflow 在我的 GPU 上正常工作的反馈?
谢谢你。
解决方案
你可以在这里试试这个。
我在:RTX 2080,ubuntu 16.04
你需要安装:
cuda 10.0
cuDNN v7.4.1.5
libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64
libcudnn7-doc_7.4.1.5-1+cuda10.0_amd64
libcudnn7_7.4.1.5-1+cuda10.0_amd64
nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64
英伟达-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:02:00.0 Off | N/A |
| 22% 39C P0 N/A / N/A | 0MiB / 7951MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
一些原因 nvidia-smi 显示 10.1,但那是错误的
nvcc --版本:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
你可以在这里一步一步得到它:
1. NVIDIA-Linux driver: https://www.nvidia.com/Download/index.aspx?lang=en-us
2. cuda https://developer.nvidia.com/cuda-downloads
3. cudnn: https://developer.nvidia.com/rdp/cudnn-download
4. install: libcudnn7-dev, libcudnn7-doc, libcudnn7_7
5. install: nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
下载 libcudnn 和 nvidia-machine-learning:
https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/
我正在使用:
tensorflow (1.13.1) tensorflow-gpu (1.13.1) tf-nightly-gpu (1.14.1.dev20190509)
内部代码,例如(我在 tensorflow 中的 LSTM 上进行 GPU 工作!)如果您的代码以以下开头:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
keras.backend.set_session(sess)
推荐阅读
- html - HTML 菜单标记位置
- c# - Win IoT Core 上的 UWP。错误: Pin ' 当前以不兼容的共享模式打开。使用 Dispose 时
- python - 如何为 python 数据框中的整个列创建日期格式?
- jquery - 为什么 django-select2 ModelSelect2Widget 小部件没有检索结果
- vue.js - 父组件未从 Nuxt 应用程序中的子组件获取数据
- r - R帮助-按行引导时间序列数据
- reactjs - 反应:检查图像源/网址是否为空然后返回不同的网址?
- php - Nginx 正在运行,但页面为空白
- timeout - Golang http 自动重试 StatusRequestTimeout (408) 响应
- ios - 如何从本机 iOS 应用程序中打开“添加付款方式”设备设置?