Difference between revisions of "Tensorflow with gpu"

From ElphelWiki
Jump to: navigation, search
(Pre)
(Tensorflow and OpenCV building notes)
 
(85 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Pre==
+
==OS==
* check device
+
* Kubuntu 16.04 LTS
  ~$ lspci | grep NVIDIA
+
 
 +
==Setup (guide)==
 +
Just follow:
 +
* The [[Tensorflow_with_gpu#Walkthrough_for_CUDA_10.1_.2820190602.29|'''walkthrough''']] in the bottom is for CUDA 10.1, cuDNN 7.6.1, python3
 +
* [http://www.python36.com/how-to-install-tensorflow-gpu-with-cuda-9-2-for-python-on-ubuntu/  '''This guide'''] (Ubuntu 16.04 64-bit, CUDA 9.2, cuDNN 7.1.4, python3)
 +
* [http://www.python36.com/install-tensorflow141-gpu/ '''This guide'''] (Ubuntu 16.04 64-bit, CUDA 9.1, cuDNN 7.1.2, python3)
 +
 
 +
==Setup (some details)==
 +
* Check device
 +
  <font size='2'><b>~$ lspci | grep NVIDIA</b>
 
  81:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
 
  81:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
  81:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
+
  81:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)</font>
* check driver version
+
 
  ~$ cat /proc/driver/nvidia/version
+
* Check driver version:
 +
  <font size='2'><b>~$ cat /proc/driver/nvidia/version</b>
 
  NVRM version: NVIDIA UNIX x86_64 Kernel Module  387.26  Thu Nov  2 21:20:16 PDT 2017
 
  NVRM version: NVIDIA UNIX x86_64 Kernel Module  387.26  Thu Nov  2 21:20:16 PDT 2017
  GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
+
  GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)</font>
* install cuda 9.2 with patches
 
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal
 
* then
 
sudo apt-get install cuda-9-2 # included on that NVidia page
 
  
  https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
+
* Install cuda 9.2 with patch(es):
*
+
  <font size='2'># https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal:
  ~$ export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}
+
<b>~$ sudo dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
  ~$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
+
  ~$ sudo apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
*
+
~$ sudo apt-get update
  ~$ nvidia-smi  
+
  ~$ sudo apt-get install cuda</b>
  Thu Apr 26 12:39:25 2018       
+
<b># INSTALL THE PATCH(ES)</b></font>
 +
 
 +
* Might need to reboot PC. If cuda 9.2 got installed over other version, nvidia tools will be throwing errors about driver  versions mismatching, try
 +
  <font size='2'><b>~$ nvidia-smi</b></font>
 +
Good looking output:
 +
  <font size='1'>Wed Jun 13 15:55:44 2018       
 
  +-----------------------------------------------------------------------------+
 
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 387.26                Driver Version: 387.26                    |
+
  | NVIDIA-SMI 396.26                Driver Version: 396.26                    |
 
  |-------------------------------+----------------------+----------------------+
 
  |-------------------------------+----------------------+----------------------+
 
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 
  | Fan  Temp  Perf  Pwr:Usage/Cap|        Memory-Usage | GPU-Util  Compute M. |
 
  | Fan  Temp  Perf  Pwr:Usage/Cap|        Memory-Usage | GPU-Util  Compute M. |
 
  |===============================+======================+======================|
 
  |===============================+======================+======================|
  |  0  GeForce GT 610      Off  | 00000000:81:00.0 N/A |                  N/A |
+
  |  0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0 On |                  N/A |
  | N/A   31C   P8   N/A N/A |    148MiB /   956MiB |     N/A     Default |
+
  | 33%   36C   P8     1W 46W |    229MiB / 2000MiB |     0%     Default |
  +-------------------------------+----------------------+----------------------+
+
  +-------------------------------+----------------------+----------------------+                                                                            
                                                                               
 
 
  +-----------------------------------------------------------------------------+
 
  +-----------------------------------------------------------------------------+
 
  | Processes:                                                      GPU Memory |
 
  | Processes:                                                      GPU Memory |
 
  |  GPU      PID  Type  Process name                            Usage      |
 
  |  GPU      PID  Type  Process name                            Usage      |
 
  |=============================================================================|
 
  |=============================================================================|
  |    0                   Not Supported                                      |
+
  |    0     1305      G  /usr/lib/xorg/Xorg                          136MiB |
  +-----------------------------------------------------------------------------+
+
|    0      3587      G  /usr/bin/krunner                              1MiB |
 +
|    0      3590      G  /usr/bin/plasmashell                          67MiB |
 +
|    0      3693      G  /usr/bin/plasma-discover                      20MiB |
 +
  +-----------------------------------------------------------------------------+</font>
 +
 
 +
* Check out post installation docs:
 +
<font size='2'>https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions:
 +
# Export paths
 +
<b>~$ export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}
 +
~$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
 +
~$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/extras/CUPTI/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}</b></font>
  
* install tensorflow
+
* Install TensorFlow (build from sources for cuda 9.2):
  ~$ sudo apt-get install python3-pip # if it is not already installed
+
<font size='2'>link 1: (preferrable guide): http://www.python36.com/install-tensorflow141-gpu/
 +
  link 2: https://www.tensorflow.org/install/install_sources</font>
  
  ~$ sudo pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0-cp35-cp35m-linux_x86_64.whl
+
* '''[Optional]''' Install TensorFlow (prebuilt for cuda 9.0?):
 +
<font size='2'># docs:
 +
# - https://www.tensorflow.org/install/install_linux
 +
# some instructions:
 +
# - install cuDNN
 +
<b>~$ sudo apt-get install python3-pip # if it is not already installed</b>
 +
  <b>~$ sudo pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0-cp35-cp35m-linux_x86_64.whl</b>
  
* testing: unsupported card '''GeForce GT 610'''
+
==Testing setup==
~$ python3
 
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
 
[GCC 5.4.0 20160609] on linux
 
Type "help", "copyright", "credits" or "license" for more information.
 
>>> import tensorflow as tf
 
>>> hello = tf.constant('Hello, World!')                                                                                                                                                                           
 
>>> sess = tf.Session()                                                                                                                                                                                               
 
2018-04-26 13:00:19.050625: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA                                         
 
2018-04-26 13:00:19.181581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:                                                                                                           
 
name: GeForce GT 610 major: 2 minor: 1 memoryClockRate(GHz): 1.62                                                                                                                                                               
 
pciBusID: 0000:81:00.0                                                                                                                                                                                                         
 
totalMemory: 956.50MiB freeMemory: 631.69MiB                                                                                                                                                                                             
 
<font color='red'><b>2018-04-26 13:00:19.181648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1394] Ignoring visible gpu device (device: 0, name: GeForce GT 610, pci bus id: 0000:81:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.</b></font>                                                                                                                                                                                                             
 
2018-04-26 13:00:19.181669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:                                                                                     
 
2018-04-26 13:00:19.181683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0                                                                                                                                                     
 
2018-04-26 13:00:19.181695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:  N                                                                                                                                                     
 
>>> print(sess.run(hello))                                                                                                                                                                                                                           
 
b'Hello, World!'
 
  
* testing: supported card '''GeForce GTX 750 Ti'''
+
* Supported card '''GeForce GTX 750 Ti''' (<b>list of [https://developer.nvidia.com/cuda-gpus supported graphic cards]</b>):
  ~$ python3
+
  <font size='2'><b>~$ python3</b>
 
  Python 3.5.2 (default, Nov 23 2017, 16:37:01)  
 
  Python 3.5.2 (default, Nov 23 2017, 16:37:01)  
 
  [GCC 5.4.0 20160609] on linux
 
  [GCC 5.4.0 20160609] on linux
 
  Type "help", "copyright", "credits" or "license" for more information.
 
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import tensorflow as tf
+
  '''>>> import tensorflow as tf'''
  >>> hello = tf.constant('Hello, World!')   
+
  '''>>> hello = tf.constant('Hello, World!')'''  
  >>> sess = tf.Session()
+
  '''>>> sess = tf.Session()'''
 
  2018-04-26 18:14:05.427668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning  
 
  2018-04-26 18:14:05.427668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning  
 
  NUMA node zero
 
  NUMA node zero
Line 82: Line 91:
 
  2018-04-26 18:14:05.927313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1289 MB memory) -> physical GPU (device: 0,  
 
  2018-04-26 18:14:05.927313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1289 MB memory) -> physical GPU (device: 0,  
 
  name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
 
  name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
  >>> print(sess.run(hello))
+
  '''>>> print(sess.run(hello))'''
 +
b'Hello, World!'</font>
 +
 
 +
* Unsupported card '''GeForce GT 610'''
 +
'''~$ python3'''
 +
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
 +
[GCC 5.4.0 20160609] on linux
 +
Type "help", "copyright", "credits" or "license" for more information.
 +
'''>>> import tensorflow as tf'''
 +
'''>>> hello = tf.constant('Hello, World!')'''                                                                                                                                                                           
 +
'''>>> sess = tf.Session()'''                                                                                                                                                                                               
 +
2018-04-26 13:00:19.050625: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA                                         
 +
2018-04-26 13:00:19.181581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:                                                                                                           
 +
name: GeForce GT 610 major: 2 minor: 1 memoryClockRate(GHz): 1.62                                                                                                                                                               
 +
pciBusID: 0000:81:00.0                                                                                                                                                                                                         
 +
totalMemory: 956.50MiB freeMemory: 631.69MiB                                                                                                                                                                                             
 +
<font color='red'><b>2018-04-26 13:00:19.181648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1394] Ignoring visible gpu device (device: 0, name: GeForce GT 610, pci bus id: 0000:81:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.</b></font>                                                                                                                                                                                                             
 +
2018-04-26 13:00:19.181669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:                                                                                     
 +
2018-04-26 13:00:19.181683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0                                                                                                                                                     
 +
2018-04-26 13:00:19.181695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:  N                                                                                                                                                     
 +
'''>>> print(sess.run(hello))'''                                                                                                                                                                                                                           
 
  b'Hello, World!'
 
  b'Hello, World!'
 
* Supported cards:
 
https://developer.nvidia.com/cuda-gpus
 
  
 
* As a quickfix had to install CuDNN 7.0.5 instead of latest:
 
* As a quickfix had to install CuDNN 7.0.5 instead of latest:
 
  https://stackoverflow.com/questions/49960132/cudnn-library-compatibility-error-after-loading-model-weights
 
  https://stackoverflow.com/questions/49960132/cudnn-library-compatibility-error-after-loading-model-weights
 +
 +
* Print tensorflow version
 +
<font size='2'>'''>>> print(tf.__version__)'''</font>
 +
 +
==Problems==
 +
* <font color='green'>[SOLVED]</font> <b>AttributeError: '_NamespacePath' object has no attribute 'sort'</b>
 +
<font size='2'># Notes:
 +
  After updating some packages probably. python3?
 +
# How to reproduce:
 +
1:
 +
<b>~$ python3
 +
>>> import tensorflow</b>
 +
2:
 +
<b>~$ virtualenv --system-site-packages -p python3</b>
 +
# Solution:
 +
<b>~$ sudo pip3 install setuptools --upgrade</b></font>
 +
 +
==Walkthrough for CUDA 10.1 (20190602)==
 +
 +
===Install CUDA===
 +
* In this [https://www.tensorflow.org/install/gpu guide] there's a [https://developer.nvidia.com/cuda-toolkit-archive link to CUDA toolkit].
 +
** That toolkit (CUDA Toolkit 10.1 update1 (May 2019)) also updated the system driver to 418.67
 +
** Reboot
 +
===Install cuDNN===
 +
* Have to have an account with NVIDIA - downloaded [https://developer.nvidia.com/rdp/cudnn-download#a-collapse761-101 cuDNN v7.6.1 (June 24, 2019), for CUDA 10.1]
 +
 +
===Option 1: installing tensorflow from source===
 +
Basically, [https://www.tensorflow.org/install/source '''this guide'''], some key notes:
 +
* [https://www.tensorflow.org/install/source#install_bazel Install bazel] - version 0.25.2 (newer will not work)
 +
* To build, read [https://www.tensorflow.org/install/source#download_the_tensorflow_source_code this link]:
 +
git clone https://github.com/tensorflow/tensorflow.git
 +
cd tensorflow
 +
git checkout r1.14
 +
./configure
 +
 
 +
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
 +
# 4-5 hours later
 +
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
 +
sudo pip3 install /tmp/tensorflow_pkg/tensorflow-[Tab]
 +
 +
* Testing:
 +
~$ python3
 +
>>> import tensorflow as tf
 +
>>> hello = tf.constant('Hello, World!')                                                                                                                                                                           
 +
>>> sess = tf.Session()
 +
 +
===Option 2: using docker===
 +
Follow [https://www.tensorflow.org/install/docker '''this guide''']. Key notes:
 +
* Tensorflow docker image requires nvidia docker image, nvidia docker image requires ''apt install nvidia-docker2'', ''nvidia-docker2'' requires ''apt install docker-ce'':
 +
- https://github.com/NVIDIA/nvidia-docker
 +
- https://docs.docker.com/install/linux/docker-ce/ubuntu/
 +
 +
* Test run:
 +
# Test 1: GPU support inside container:
 +
sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
 +
# Test 2: Test all together
 +
sudo docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter
 +
sudo docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu-py3-jupyter python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
 +
# Test 3: Run a local script (and include a local dir) in contatiner:
 +
https://www.tensorflow.org/install/docker
 +
 +
 +
==Setup walkthrough for CUDA 10.2 (Dec 2019)==
 +
 +
===Install CUDA===
 +
* In this [https://www.tensorflow.org/install/gpu guide] there's a [https://developer.nvidia.com/cuda-toolkit-archive link to CUDA toolkit].
 +
** The toolkit (CUDA Toolkit 10.2) also updated the system driver to 440.33.01
 +
** Will have to reboot
 +
 +
===Docker===
 +
====Instructions====
 +
'''https://www.tensorflow.org/install/docker'''
 +
 +
Quote:
 +
Docker is the easiest way to enable TensorFlow GPU support on Linux since only the NVIDIA® GPU driver is required on the host machine (the NVIDIA® CUDA® Toolkit does not need to be installed).
 +
 +
====Docker images====
 +
Where to browse: https://hub.docker.com/r/tensorflow/tensorflow/:
 +
{| class='wikitable'
 +
!TF version
 +
!Python major version
 +
!GPU support
 +
!NAME:TAG for Docker command
 +
|-
 +
|align='center'|1.15
 +
|align='center'|3
 +
|align='center'|yes
 +
|<font color='darkgreen'>'''tensorflow/tensorflow:1.15.0-gpu-py3'''
 +
|-
 +
|align='center'|2.0.0+
 +
|align='center'|3
 +
|align='center'|yes
 +
|<font color='darkgreen'>'''tensorflow/tensorflow:latest-gpu-py3'''
 +
|-
 +
|align='center'|2.0.0+
 +
|align='center'|2
 +
|align='center'|yes
 +
|<font color='darkgreen'>'''tensorflow/tensorflow:latest-gpu'''
 +
|}
 +
 +
====nvidia-docker====
 +
Somehow it was already installed.
 +
 +
* Check NVIDIA docker version
 +
~$ nvidia-docker version
 +
 +
* In the docs it's clear that Docker version 19.03+ should use nvidia-docker2. For Docker of older versions - nvidia-docker v1 should be used.
 +
* It's not immediately clear about the '''nvidia-container-runtime'''. nvidia-docker v1 & v2 should have already registered it.
 +
 +
====Notes====
 +
* Can mount a local directory in a 'binding' mode - i.e., update files locally so they are updated in the docker container as well:
 +
<font size='2'># this will bind-mount directory '''target''' located in '''$(pwd)''', which is a dir the command is run from
 +
# to '''/app''' in the docker container
 +
 +
~$ '''docker run \'''
 +
    '''-it \'''
 +
    '''--rm \'''
 +
    '''--name devtest \'''
 +
    '''-p 0.0.0.0:6006:6006 \'''
 +
    '''--mount type=bind,source="$(pwd)"/target,target=/app \'''
 +
    '''--gpus all \'''
 +
    <font color='darkgreen'>'''tensorflow/tensorflow:latest-gpu-py3</font> \'''
 +
    '''bash'''</font>
 +
 +
* How to run tensorboard from the container:
 +
<font size='2'># from [https://briancaffey.github.io/2017/11/20/using-tensorflow-and-tensor-board-with-docker.html here]
 +
# From the running container's command line (since it was run with 'bash' in the step above).
 +
# set a correct --logdir
 +
root@e9efee9e3fd3:/# '''tensorboard --bind_all --logdir=/app/log.txt'''  # remove --bind_all for TF 1.15
 +
# Then open a browser:
 +
'''http://localhost:6006'''</font>
 +
 +
==Tensorflow and OpenCV building notes==
 +
===Build 1===
 +
# TF 1.15.0
 +
# CUDA 10.0 and Toolkit and stuff
 +
# OpenCV 3.4.9
 +
 +
====TF 1.15.0====
 +
* Will build with Bazel 0.25.2 (installed from [https://github.com/bazelbuild/bazel/releases/tag/0.25.2 deb archive])
 +
* TF - downloaded as [https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0 tensorflow-1.15.0.tar.gz]
 +
1. Unpack
 +
2. cd tensorflow-1.15.0
 +
3. ./configure
 +
4.
 +
 +
===Build 2===
 +
# TF 1.13.1
 +
# CUDA 10.0 and Toolkit and stuff
 +
# OpenCV 3.4.9
 +
 +
====TF 1.13.1====
 +
* Will build with Bazel 0.21.0 (installed from [https://github.com/bazelbuild/bazel/releases/tag/0.21.0 deb archive])

Latest revision as of 16:49, 7 January 2020

OS

  • Kubuntu 16.04 LTS

Setup (guide)

Just follow:

  • The walkthrough in the bottom is for CUDA 10.1, cuDNN 7.6.1, python3
  • This guide (Ubuntu 16.04 64-bit, CUDA 9.2, cuDNN 7.1.4, python3)
  • This guide (Ubuntu 16.04 64-bit, CUDA 9.1, cuDNN 7.1.2, python3)

Setup (some details)

  • Check device
~$ lspci | grep NVIDIA
81:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
81:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
  • Check driver version:
~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  387.26  Thu Nov  2 21:20:16 PDT 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
  • Install cuda 9.2 with patch(es):
# https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal:
~$ sudo dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
~$ sudo apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
~$ sudo apt-get update
~$ sudo apt-get install cuda
# INSTALL THE PATCH(ES)
  • Might need to reboot PC. If cuda 9.2 got installed over other version, nvidia tools will be throwing errors about driver versions mismatching, try
~$ nvidia-smi

Good looking output:

Wed Jun 13 15:55:44 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0  On |                  N/A |
| 33%   36C    P8     1W /  46W |    229MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+                                                                             
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1305      G   /usr/lib/xorg/Xorg                           136MiB |
|    0      3587      G   /usr/bin/krunner                               1MiB |
|    0      3590      G   /usr/bin/plasmashell                          67MiB |
|    0      3693      G   /usr/bin/plasma-discover                      20MiB |
+-----------------------------------------------------------------------------+
  • Check out post installation docs:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions:
# Export paths
~$ export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}
~$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
~$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/extras/CUPTI/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  • Install TensorFlow (build from sources for cuda 9.2):
link 1: (preferrable guide): http://www.python36.com/install-tensorflow141-gpu/
link 2: https://www.tensorflow.org/install/install_sources
  • [Optional] Install TensorFlow (prebuilt for cuda 9.0?):
# docs: 
# - https://www.tensorflow.org/install/install_linux
# some instructions:
# - install cuDNN
~$ sudo apt-get install python3-pip # if it is not already installed
~$ sudo pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0-cp35-cp35m-linux_x86_64.whl

Testing setup

~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, World!')  
>>> sess = tf.Session()
2018-04-26 18:14:05.427668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning 
NUMA node zero
2018-04-26 18:14:05.428033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.1105
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.53GiB
2018-04-26 18:14:05.428061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-26 18:14:05.927106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-26 18:14:05.927149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-26 18:14:05.927163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-26 18:14:05.927313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1289 MB memory) -> physical GPU (device: 0, 
name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
>>> print(sess.run(hello))
b'Hello, World!'
  • Unsupported card GeForce GT 610
~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, World!')                                                                                                                                                                            
>>> sess = tf.Session()                                                                                                                                                                                                 
2018-04-26 13:00:19.050625: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA                                           
2018-04-26 13:00:19.181581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:                                                                                                             
name: GeForce GT 610 major: 2 minor: 1 memoryClockRate(GHz): 1.62                                                                                                                                                                
pciBusID: 0000:81:00.0                                                                                                                                                                                                           
totalMemory: 956.50MiB freeMemory: 631.69MiB                                                                                                                                                                                              
2018-04-26 13:00:19.181648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1394] Ignoring visible gpu device (device: 0, name: GeForce GT 610, pci bus id: 0000:81:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.                                                                                                                                                                                                              
2018-04-26 13:00:19.181669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:                                                                                       
2018-04-26 13:00:19.181683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0                                                                                                                                                       
2018-04-26 13:00:19.181695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N                                                                                                                                                       
>>> print(sess.run(hello))                                                                                                                                                                                                                             
b'Hello, World!'
  • As a quickfix had to install CuDNN 7.0.5 instead of latest:
https://stackoverflow.com/questions/49960132/cudnn-library-compatibility-error-after-loading-model-weights
  • Print tensorflow version
>>> print(tf.__version__)

Problems

  • [SOLVED] AttributeError: '_NamespacePath' object has no attribute 'sort'
# Notes:
 After updating some packages probably. python3?
# How to reproduce:
1:
~$ python3
>>> import tensorflow
2:
~$ virtualenv --system-site-packages -p python3
# Solution:
~$ sudo pip3 install setuptools --upgrade

Walkthrough for CUDA 10.1 (20190602)

Install CUDA

  • In this guide there's a link to CUDA toolkit.
    • That toolkit (CUDA Toolkit 10.1 update1 (May 2019)) also updated the system driver to 418.67
    • Reboot

Install cuDNN

Option 1: installing tensorflow from source

Basically, this guide, some key notes:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.14
./configure
 
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
# 4-5 hours later
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip3 install /tmp/tensorflow_pkg/tensorflow-[Tab]
  • Testing:
~$ python3
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, World!')                                                                                                                                                                            
>>> sess = tf.Session()

Option 2: using docker

Follow this guide. Key notes:

  • Tensorflow docker image requires nvidia docker image, nvidia docker image requires apt install nvidia-docker2, nvidia-docker2 requires apt install docker-ce:
- https://github.com/NVIDIA/nvidia-docker
- https://docs.docker.com/install/linux/docker-ce/ubuntu/
  • Test run:
# Test 1: GPU support inside container:
sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
# Test 2: Test all together
sudo docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter
sudo docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu-py3-jupyter python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
# Test 3: Run a local script (and include a local dir) in contatiner:
https://www.tensorflow.org/install/docker


Setup walkthrough for CUDA 10.2 (Dec 2019)

Install CUDA

  • In this guide there's a link to CUDA toolkit.
    • The toolkit (CUDA Toolkit 10.2) also updated the system driver to 440.33.01
    • Will have to reboot

Docker

Instructions

https://www.tensorflow.org/install/docker

Quote:

Docker is the easiest way to enable TensorFlow GPU support on Linux since only the NVIDIA® GPU driver is required on the host machine (the NVIDIA® CUDA® Toolkit does not need to be installed).

Docker images

Where to browse: https://hub.docker.com/r/tensorflow/tensorflow/:

TF version Python major version GPU support NAME:TAG for Docker command
1.15 3 yes tensorflow/tensorflow:1.15.0-gpu-py3
2.0.0+ 3 yes tensorflow/tensorflow:latest-gpu-py3
2.0.0+ 2 yes tensorflow/tensorflow:latest-gpu

nvidia-docker

Somehow it was already installed.

  • Check NVIDIA docker version
~$ nvidia-docker version
  • In the docs it's clear that Docker version 19.03+ should use nvidia-docker2. For Docker of older versions - nvidia-docker v1 should be used.
  • It's not immediately clear about the nvidia-container-runtime. nvidia-docker v1 & v2 should have already registered it.

Notes

  • Can mount a local directory in a 'binding' mode - i.e., update files locally so they are updated in the docker container as well:
# this will bind-mount directory target located in $(pwd), which is a dir the command is run from 
# to /app in the docker container

~$ docker run \
   -it \
   --rm \
   --name devtest \
   -p 0.0.0.0:6006:6006 \
   --mount type=bind,source="$(pwd)"/target,target=/app \
   --gpus all \
   tensorflow/tensorflow:latest-gpu-py3 \
   bash
  • How to run tensorboard from the container:
# from here
# From the running container's command line (since it was run with 'bash' in the step above).
# set a correct --logdir
root@e9efee9e3fd3:/# tensorboard --bind_all --logdir=/app/log.txt  # remove --bind_all for TF 1.15
# Then open a browser:
http://localhost:6006

Tensorflow and OpenCV building notes

Build 1

  1. TF 1.15.0
  2. CUDA 10.0 and Toolkit and stuff
  3. OpenCV 3.4.9

TF 1.15.0

1. Unpack
2. cd tensorflow-1.15.0
3. ./configure
4.

Build 2

  1. TF 1.13.1
  2. CUDA 10.0 and Toolkit and stuff
  3. OpenCV 3.4.9

TF 1.13.1

  • Will build with Bazel 0.21.0 (installed from deb archive)