Guanghan Ning’s Blog

Recent Posts

Preface

I am sharing thoughts with you in this blog.

If something echoes back, I feel most lucky;

if it is to some extent understood... more than satisfied;

if it drains down into the ground, I have nothing to regret.


Build Personal Deep Learning Rig: GTX 1080 + Ubuntu 16.04 + CUDA 8.0RC + CuDnn 7 + Tensorflow/Mxnet/Caffe/Darknet

My intern at TCL is over soon. Before going back to the campus for graduation, I have decided to build myself a personal deep learning rig. I guess I cannot really rely on the machines either in the company or in the lab, because ultimately the workstation is not mine, and the development environment may be messed up (It already happened once) . With a personal rig, I can conveniently use teamviewer to login my deep learning workstation at any time. And I got the chance to build everything from scratch.

In this post, I will go through the whole process of the building a deep learning PC, including the hardware and the software. Upon sharing it with you, I hope it will be helpful to researchers and engineers with the same needs. Since I am building the rig with GTX 1080, Ubuntu 16.04, CUDA 8.0RC, CuDnn 7, everything is pretty up-to-date. Here is an overview of this article:

Hardware

  • 1. Pick Parts
  • 2. Build the workstation

Software

  • 3. Operating System Installation
    • Preparing bootable installation USB drives
    • Build systems
  • 4. Deep Learning Environment Installation
    • Remote Control
      • teamviewer
    • Bundle Management
      • anaconda
    • Development Environment
      • python IDE
    • GPU-optimized Environment
      • CUDA
      • CuDnn
    • Deep Learning Frameworks
      • Tensorflow
      • Mxnet
      • Caffe
      • Darknet
  • 5. Docker for Out-of-the-Box Deep Learning Environment
    • Install Docker
    • Install NVIDIA-Docker
    • Download Deep Learning Docker Images
    • Share Data between Host and Container
    • Learn Simple Docker Commands

 


Hardware:

1. Pick Parts

I recommend using PcPartPicker to pick your parts. It helps you find the source where you can buy your part with the lowest price available, and it checks the compatibility of the selected parts for you. They also have a youtube channel where they offer videos that demonstrate the building process.

In my case, I used their build article as reference, and created a build list myself, which can be found [here]. Here are the parts that I used to build the workstation.

PARTS
Since we are doing deep learning research, a good GPU is necessary. Therefore, I choose the recently released GTX 1080. It was quite hard to buy, but if you notice the bundles in newegg, some people are gathering this to sell in [GPU + motherboard] or [GPU + Power] bundles. Market, you know. It is better buying the bundle than buying it at a raised price, though. Anyway, a good GPU will make the training or finetuning process much faster. Here are some figures to show the advantage of GTX 1080 over some other GPUs, with respect to performance, price, and power efficiency (saves you electricity daily and the money to buy the appropriate PC power supply).
gtx_1 gtx_2

Note that GTX 1080 has only 8GB memory, compared to 12GB of TITAN X. You may be richer or more generous to yourself, therefore considering using stacked GPUs. Then remember to choose another motherboard that has more PCIs.

2. Build from Parts

As of building from the parts, I followed the tutorial of [this video]. Although the parts are slightly different, the building process is quite similar. I have no previous experience building by my own, but with this tutorial I was able to make it work within 3 hours. (It should take you less time, but I was extremely cautious, you know.)

IMG_20160708_020941 - Copy 


Software: 

3. Installation of Operating Systems

It is very common to use Ubuntu for deep learning research. But sometimes you may need another operating system working as well. For example, if you are also a VR developer, having a GTX 1080, you may want a Win10 for VR development with Unity or whatsoever. Here I introduce the installation of both Win10 and Ubuntu. If you are only interested in the Ubuntu installation, you can skip installing windows.

3.1 Preparing bootable installation USB drives
It is very convenient to install operating systems with USB disks, as we all have them. Because the USB disks will be formatted, you won’t want that happen to your portable hard disk. Or if you have writable DVDs, you can use them to install operating systems and save them for future use, if you can find them again by then.
Since it is well illustrated in the official website, you can go to [Windows 10 page] to learn how to make the USB drive.As of Ubuntu, you can similarly download the ISO and create USB installation media or burn it to a DVD. If you are now using Ubuntu system, follow [this tutorial] from Ubuntu official website. If you are current using Windows, follow [this tutorial] istead.

3.2 Installing Systems
It is highly recommended that you install windows first for a dual-system installation. I will skip win10 installation as detailed guide can be found here: [Windows 10 page] . One thing to note is that you will need the activation key. You can find the tag on the bottom of your laptop, if it has been installed windows 7 or windows 10 upon purchasing.

Installing Ubuntu16.04 was a little tricky for me, which was kind of a surprise. It was mainly because I did not have the GTX 1080 driver pre-installed at the very beginning. I will share my story with you, in case you encounter the same problems.

Installing Ubuntu
First things first, insert the boot USB for installation. Nothing is showing on my LG screen, except that it says frequency is too high. But the screen is okay, as is tested on another laptop. I tried to connect the PC with a TV, which was showing, but only the desktop with no tool panel. I figured out it was the problem of the NVIDIA driver. So I went to BIOS and set the integrated graphics as default and restart. Remember to switch the HDMI from the port on GTX1080 to that on the motherboard. Now the display works well. I successfully installed Ubuntu following its prompt guides.
installing_ubuntu
In order to use GTX1080, go to [this page] to get the NVIDIA display driver for Ubuntu. Upon installing this driver, make sure that GTX1080 is on the motherboard.
Now it shows “You appear to be running an X server.. “.  I followed this link to solve this problem and installed the driver. I quote it here:

  • Make sure you are logged out.
  • Hit CTRL+ALT+F1and login using your credentials.
  • kill your current X server session by typing sudo service lightdm stopor sudo stop lightdm
  • Enter runlevel 3 by typing sudo init 3and install your *.run file.
  • You might be required to reboot when the installation finishes. If not, run sudo service lightdm startor sudo start lightdm to start your X server again.

After installing the driver, we can now restart and set the GTX1080 as default in BIOS. We are good to go.

Some other small problems I encountered are listed here, in case they are helpful:

  • Problem: When I restart, I couldn’t find the option to choose windows.
  • Solver: In ubuntu, sudo gedit /boot/grub/grub.cfg, add following lines:
    1. menuentry ‘Windows 10′{
    2.     set root=’hd0,msdos1′
    3.     chainloader +1
    4. }
  • Problem: Ubuntu does not support wireless adapter Belkin N300, which is commonly sold in Bestbuy.
  • Solver: Follow instructions in this link, the problem will be solved.
  • Problem: Upon installing teamviewer, it says “dependencies not met”
  • Solver: Refer to this link.

4. Deep Learning Environment 

4.1 Installation of Remote Control (TeamViewer):

dpkg -i teamviewer_11.0.xxxxx_i386.deb

4.2 Installation of Package Management System (Anaconda):

Anaconda is an easy-to-install free package manager, environment manager, Python distribution, and collection of over 720 open source packages offering free community support.It can be used to create virtual environments, where each environment will not mess up with each other. It is helpful when we use different deep learning frameworks at the same time, and the configurations are different.Using it to install packages is convenient as well.It can be easily installed, follow this.
Some commands to start using virtual environment:

  • source activate [virtualenv]
  • source deactivate

4.3 Installation of Development Environment (Python IDE):

 4.3.1 Spyder vs Pycharm?

Spyder

  • Advantage:matlab-like,easy to review intermediate results.

Pycharm:

  • Advantage:modular coding,more complete IDE for web development frameworks and cross-platform.

In my personal philosophy, I regard them to be merely tools. Each tool will be used when it comes in handy. I will use IDEs for the construction of the backbone for the project. For example, use pycharm for the framework construction. After that, I will just modify code with VIM. It is not that VIM is so powerful and showy, but because it is the single text editor that I want to really master. As of text editors, there is no need we should master two. For special occasions, where we need to frequently check IO, directories, etc, we might want to use spyder instead.

4.3.2 Installation:

  • spyder:
    • You do not need to install spyder, as is included in anaconda.
  • pycharm
    • Download from the official website. Just unzip.
    • Set anaconda to be the project interpreter for pycharm, dealing with package management. Follow this.
  • vim
    • sudo apt-get install vim
    • The configuration that I currently use: Github
  • Git
    • sudo apt install git
    • git config –global user.name “Guanghan Ning”
    • git config –global user.email “guanghan.ning@gmail.com”
    • git config –global core.editor vim
    • git config –list

 

4.4 Installation of GPU-Optimized Computing Environment (CUDA and CuDNN)
4.4.1 CUDA

  • Install CUDA 8.0 RC
    • There are two reasons to choose the 8.0 version over 7.5:
      • CUDA 8.0 will give a performance gain for GTX1080 (Pascal), compared to CUDA 7.5.
      • It seems that ubuntu 16.04 does not support CUDA 7.5 because you cannot find it to download on the official website. Therefore CUDA 8.0 is the only choice.
  • CUDA starter Guide
  • CUDA Installer Guide
    • sudo sh cuda_8.0.27_linux.run
    • Follow the command-line prompts
    • As part of the CUDA environment, you should add the following in the ~/.bashrc  file of your home folder.
      • export CUDA_HOME=/usr/local/cuda-8.0
      • export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
      • PATH=${CUDA_HOME}/bin:${PATH}
      • export PATH
    • check if CUDA is installed (Remember to restart the terminal):
      • nvcc –version

4.4.2 Cudnn(CUDA Deep Learning Libarary)

  • install cudnn
    • Version:Cudnn v5.0 for CUDA 8.0RC
  • User Guide
  • Install Guide
    • Choice one: (Add CuDNN path to environment variables)
      • Extract folder “cuda”
      • cd <installpath>
      • export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH
    • Choice two:  (Copy the files of CuDNN to CUDA folder. If CUDA is working alright, it will automatically find CUDNN by relative path)
      • tar xvzf cudnn-8.0.tgz
      • cd cudnn
      • sudo cp include/cudnn.h /usr/local/cuda/include
      • sudo cp lib64/libcudnn* /usr/local/cuda/lib64
      • sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

4.5 Installation of Deep Learning Frameworks:

4.5.1 Tensorflow / keras

  • Install tensorflow first
    • Install with anaconda
      • conda create -n tensorflow python=3.5
    • Install Tensorflow using Pip in the environment (It does NOT supports cuda 8.0 at the moment. I will update this when binaries for CUDA 8.0 come out)
      • source activate tensorflow
      • sudo apt install python3-pip
      • export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp35-cp35m-linux_x86_64.whl
      • pip3 install –upgrade $TF_BINARY_URL
    • Install Tensorflow directly from source
      • install bazel
        • install jdk 8
        • uninstall jdk 9
      • sudo apt-get install python-numpy swig python-dev
      • ./configure
      • build with bazel
        • bazel build -c opt –config=cuda //tensorflow/cc:tutorials_example_trainer
        • bazel-bin/tensorflow/cc/tutorials_example_trainer –use_gpu
  • Install keras
  • Use conda to activate/deactivate the virtual environment
    • source activate tensorflow
    • source deactivate

4.5.2 Mxnet

  • Create a virtual environment for Mxnet
    • conda create -n mxnet python=2.7
    • source activate mxnet
  • Follow the official website to install mxnet
    • sudo apt-get update
    • sudo apt-get install -y build-essential git libatlas-base-dev libopencv-dev
    • git clone –recursive https://github.com/dmlc/mxnet
    • edit make/config.mk
      • set cuda= 1, set cudnn= 1, add cuda path
    • cd mxnet
    • make clean_all
    • make -j4
      • One problem I encountered was,”gcc version later than 5.3 not supported!” My gcc version was 5.4, and I had to remove it.
        • apt-get remove gcc g++
        • conda install -c anaconda gcc=4.8.5
        • gcc –version
  • Python package install for mxnet
    • conda install -c anaconda numpy=1.11.1
    • Method 1:
      • cd python; sudo python setup.py install
      • sudo apt-get install python-setuptools
    • Method 2:
      • cd mxnet
      • cp -r ../mxnet/python/mxnet .
      • cp ../mxnet/lib/libmxnet.so mxnet/
    • Quick test:
      • python example/image-classification/train_mnist.py
    • GPU test:
      • python example/image-classification/train_mnist.py –network lenet –gpus 0

4.5.3 Caffe

 4.5.4 Darknet

  • This is the easiest of all to install. Just type “make”, and that’s it.

5. Docker for Out-of-the-Box Deep Learning Environment  

I used to have caffe, darknet, mxnet, tensorflow all installed correctly in Ubuntu 14.04 and TITAN-X (cuda7.5). And I have done projects with these frameworks, all turning out working well. It is therefore safer to use these pre-built environments than adventuring with latest versions, if you want to focus on the deep learning research instead of being potentially bothered by peripheral problems you may encounter. Then you should consider isolate each framework with its own environment using docker. These docker images can be found in DockerHub. 

5.1 Install Docker 

Unlike virtual machines, a docker image is built with layers. Same ingredients are shared among different images. When we download a new image, existing components won’t be re-downloaded. It is more efficient and convenient compared to the replacement of the whole virtual machine image. Docker containers are like the run-time of docker images. They can be committed and used to update docker images, just like Git.

To install docker on Ubuntu 16.04, we follow instructions on the official website. 

5.2 Install NVIDIA-Docker
Docker containers are both hardware-agnostic and platform agnostic, but docker does not natively support NVIDIA GPUs with containers. (The hardware is specialized, and driver is needed.) To solve this problem, we need the nvidia-docker to mount the devices and driver files when starting the container on the target machine. In this way, the image is agnostic of the Nvidia driver.
The installation of NVIDIA-Docker can be found here.

5.3 Download Deep Learning Docker Images 
I have collected some pre-built docker images from the Docker Hub. They are listed here:

More can be easily found on docker hub.

5.4 Share Data between Host and Container
For computer vision researchers, it will be awkward not to see results.For instance, after adding some Picasso style to an image, we would definitely want to the output images from different epoches.Check out this page quickly to share data between the host and the container.In a shared directory, we can create projects. On the host, we can start coding with text editors or whatever IDEs we prefer. And then we can run the program in the container.The data in the shared container can be viewed and processed with the GUI of the host Ubuntu machine.

5.5 Learn Simple Docker Commands
Don’t be overwhelmed  if you are new to docker. It does not need to be systematically studied unless you want to in the future.Here are some simple commands for you to use to start dealing with docker. Usually they are sufficient if you consider Docker a tool, and want to use it solely for a deep learning environment.

How to check the docker images?

  • docker images: Check all the docker images that you have.

How to check the docker containers?

  • docker ps -a:Check all the containers that you have.
  • docker ps: Check containers that are running

How to exit a docker container?

  • (Method 1) In the terminal corresponding the current container:
    • exit
  • (Method 2) Use [Ctrl + Alt + T] to open a new terminal, or use [Ctrl + Shift + T] to open a new terminal tab:
    • docker ps -a:Check the containers you have.
    • docker ps: Check the running container(s).
    • docker stop [container’s ID]: Stop this container.

How to remove a docker image?

  • docker rmi [docker_image_name]

How to remove a docker container?

  • docker rm [docker_container_name]

How to create our own docker image, based on one that is from someone else?(Update a container created from an image and commit the results to an image.)

  • load image,open a container
  • do some changes in the container
  • commit to the image
    • docker commit -m “Message: Added changes” -a “Author: Guanghan”  0b2616b0e5a8 ning/cuda-mxnet

Copy data between host and the docker container:

  • docker cp foo.txt mycontainer:/foo.txt
  • docker cp mycontainer:/foo.txt foo.txt

Open a container from a docker image:

  • If the container is to be saved because it is probably to be committed:
    • docker run -it [image_name]
  • If the container is only for temporary use:
    • docker run –rm -it [image_name]

Leave a Reply

21 Comments on "Build Personal Deep Learning Rig: GTX 1080 + Ubuntu 16.04 + CUDA 8.0RC + CuDnn 7 + Tensorflow/Mxnet/Caffe/Darknet"

Notify of

Qiming Chen
7 years 8 months ago

So helpful! Thank you!

wergerg
7 years 8 months ago

傻B才买1080,性能只差20%,价格贵一倍,搞半天最后tf根本不支持8.0….

Paul Vilevac
7 years 7 months ago

What is the intended target of this project? It’s quite a nice model for a single machine with hardware accelerated processing. I what impact be to efficient result generation by spending the same funding on a PI and docker swarm. If only I had more time to play.

panovr
7 years 7 months ago

你好,全新安装的Ubuntu 16.04能够识别出GTX 1080的硬件吗?之前看过一个帖子,说他安装的时候出现了不能识别的情况。

Anonymous
7 years 6 months ago

With Ubuntu 16.04, You may need to disable the secure boot in bios and then install the standard driver of gtx 1080 from Nvidia

Buklik
7 years 7 months ago

Hi! Thank you for your article! A question – with Ubuntu 14.04, CUDA 8.0, Cudnn v5 and GTX 1080 will work as well?

Enhao Song
7 years 7 months ago

I just bought a desktop with Gtx 1080. These are really helpful information.
Thank you!

SH Ho
7 years 6 months ago

Hi Ning

Thank you for such a superbly written summary! It’s really helpful!

So it’s been a few months since you built this machine, are you happy with it? Do you feel you need to provide extra cooling to the CPU or the case?

Is Ubuntu 16.04 stable and playing nice w/ the HW?

How is your experience w/ GTX1080? Are you glad you got it or do you think it’s inadequate or overkill for most ML and TF datasets?

Thank you and I look forward to your reply!

Jimmy Kumseong Moon
7 years 4 months ago

Wow! it’s all I wanted to know so far.
I found a used GTX 1080 GDDR5x only and have to find the other parts again.

Jimmy Kumseong Moon
7 years 4 months ago

Oops, Sorry. I forgot to say thanks a billion!

STYLIANOS IORDANIS
7 years 4 months ago

Thank you so much! incredible article
Can you elaborate on how you managed to successfully install mxnet on anaconda 2.7?
It has become impossible for me to execute:
import mxnet
I get:
File “/home/steliox/mxnet/python/mxnet/__init__.py”, line 7, in
from .base import MXNetError
File “/home/steliox/mxnet/python/mxnet/base.py”, line 43, in
_LIB = _load_lib()
File “/home/steliox/mxnet/python/mxnet/base.py”, line 35, in _load_lib
lib = ctypes.cdll.LoadLibrary(lib_path[0])
File “/home/steliox/anaconda2/lib/python2.7/ctypes/__init__.py”, line 440, in LoadLibrary
return self._dlltype(name)
File “/home/steliox/anaconda2/lib/python2.7/ctypes/__init__.py”, line 362, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/steliox/anaconda2/bin/../lib/libgomp.so.1: version `GOMP_4.0′ not found (required by /home/steliox/mxnet/python/mxnet/../../lib/libmxnet.so)

J. Lin
7 years 2 months ago

Hi. Just wondering. Is it possible to use the monitor if we are doing deep learning on the GTX 1080? Will it slow down the processing time of the deep learning library?

Alston
6 years 9 months ago

Thanks for sharing. BTW, ssh reverse tunnel could do the better job for remote control! It works even with dynamic IP, only if you have a proxy server.

J Yang
6 years 5 months ago

I train my model on GTX 1080Ti + cuda 7.5 + caffe,the caffe can be run, but the converged loss value is larger than on GTX 980Ti. How to solve it ?

O.Brire
5 years 1 month ago

问一个问题,有关darknet训练的问题。
darknet.exe detector train data/voc.data data/yolov3-voc.cfg yolov3-voc_last.weights -gpus 0
voc.data中,只给出了图片的绝对完整路径,而那些打标的数据,darknet.exe自动去找吗?通过voc.data文件中路径?这是我一直不太明白的地方,所有文档没有明确说这事。虽然上面已经运行良好,但我始终不太明白实际的运作细节,当然也没有太多时间去看darknet的源码。
其实就是需要详细说明一下整个VOC2007目录结构,以及darknet程序,他怎么去找到label之类的。

aiobjectives.com
4 years 4 months ago

http://www.aiobjectives.com/2019/11/29/usage-of-artificial-technology-in-agriculture/

Usage of Artifical intelligence technology in Agriculture and how to use in futuer in agriculture
and its benefits futuer work and discussion.

Michael
3 years 10 months ago

With the Display Port, Wireless HDMI, and DVI, you have the potential of 3 HDMI outputs with this card. Great for a triple plasma gaming rig?

wpDiscuz

Build Personal Deep Learning Rig: GTX 1080 + Ubuntu 16.04 + CUDA 8.0RC + CuDnn 7 + Tensorflow/Mxnet/Caffe/Darknet