NovelEssay.com Programming Blog

Exploration of Big Data, Machine Learning, Natural Language Processing, and other fun problems.

Installing Python Chainer and Theano on Windows with Anaconda for GPU Processing

Let's say you want to do some GPU processing on Windows and you want to use Python, because of awesome things like this:


We'll show the setup steps for installing Python Chainer and Theano on Windows 10 in this blog article.


Some Terms:

CUDAan API model created by Nvidia for GPU processing.

cuDNN - a neural network plugin library for CUDA

Chainer - a Python neural network framework package

Theano - a Python deep learning package


Initial Hardware and OS Requirements:

You need an Nvidia CUDA supported video card. (I have a NVidia GeForce GTX 750 Ti.) Check for your GPU card in the support list found here: https://developer.nvidia.com/cuda-gpus 

You need Windows 10. (Everything in this procedure is x64.)


Important: 

Versions matter a lot. I tried to do this exact same setup with Python 2.7, and I was not successful. I tried to do the same thing with Anaconda 2, and that didn't work. I tried to do this same thing with cuDNN 5.5, and that didn't work. - So many combinations didn't work for me that I decided to write about what did work.


Procedure:

1) Install Visual Studio 2015. You must install Visual Studio before installing the CUDA tool kit. You need the \bin\cl.exe compiler. I have the VS2015 Enterprise Edition, but the VS2015 Community Edition is free here: https://www.microsoft.com/en-us/download/details.aspx?id=48146


2) Install the CUDA Tool kit found here: https://developer.nvidia.com/cuda-downloads

That installs v8.0 to a path like this: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0


3) Download the cuDNN v5.0 here: https://developer.nvidia.com/cudnn

There is a v5.1 there, but it did not work for me. Feel free to try it, but I suggest trying v5.0 first.

The cuDNN is just 3 files. You'll want to drop them in the CUDA path:

  • Drop the cudnn.h file in the folder:  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include\
  • Drop the cudnn64_5.dll file in the folder:  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\
  • Drop the cudnn.lib file in the folder:  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\

4) Install Anaconda 3.6 for Windows x64 found here: https://repo.continuum.io/archive/Anaconda3-4.3.1-Windows-x86_64.exe

In case that link breaks, this is the page I found it at: https://www.continuum.io/downloads

You'll be doing most of your Anaconda/Python work in the Anaconda Console window. If Windows does not give you a nice link to the Anaconda Console, make a short cut with a link that looks like this:

"%windir%\system32\cmd.exe " "/K" C:\ProgramData\Anaconda3\Scripts\activate.bat C:\ProgramData\Anaconda3

I installed Anaconda for "All Users", so it put it at ProgramData. If you install to just one user, it puts Anaconda at a c:\users\<your name>\ path.


5) Building python packages requires a gcc/g++ compiler. Install MinGW for x64 here: https://sourceforge.net/projects/mingw-w64/

WARNING: During this install, be sure to pick the x86_64 install and not the i686 install!

The default install for MinGW is at c:\Program Files\mingw-w64\x86_64-6.3.0-posix-seh-rt_v5-rev1\mingw64\bin

The space in Program Files will break stuff later, so move it to something like this instead:

C:\mingw-w64\x86_64-6.3.0-posix-seh-rt_v5-rev1\mingw64\bin


6) Environment paths! 

If you have no idea how to set Enviornment variables in Windows, here's a link that describes how to do that: http://www.computerhope.com/issues/ch000549.htm

Add a variable called "CFlags" with this value:

  • -IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include

Add a variable called "CUDA_PATH" with this value:

  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

Add a variable called "LD_LIBRARY_PATH" with this value:

  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64

Add a variable called "LDFLAGS" with this value:

  • -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64

Add all of the following to your PATH variable (or ensure they exist):

  • C:\mingw-w64\x86_64-6.3.0-posix-seh-rt_v5-rev1\mingw64\bin
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\libnvvp
  • C:\Program Files (x86)\Microsoft Visual Studio 14.0\vc\bin
  • C:\ProgramData\Anaconda3
  • C:\ProgramData\Anaconda3\Scripts
  • C:\ProgramData\Anaconda3\Library\bin

(The Anacond3 paths might get set automatically for you.)


7) Next, bring up your Anaconda console prompt and install some packages. Type the following lines:

pip install Pillow
pip install Pycuda
pip install Theano
pip install Chainer
If any of those fail to install, stop and figure out why. Append a -vvvv to the end of the install lines to get a very-very-very verbose dump of the install process. 

Note: If you can't get pycuda to install due to "missing stdlib.h" errors, you can get the pycuda Whl file and install that directly instead.

It likely is because one of your steps #1-6 isn't quite right, or because your GCC compiler is trying to use an old x32 version that you installed long ago. (That was the case for me. I had Cygwin and a x32 GCC compiler that caused failing pip package installs.)

I also had some build fails on Chainer with some errors about "_hypot" being undefined. I fixed those by going to C:\ProgramData\Anaconda3\include\pyconfig.h, and commenting out the two places in that file that do this:
//#define hypot _hypot
That appear to have fixed that issue for me, but there's probably a better solution.

8) Sanity checks and smoke tests:
First, try to import the packages from a python command window. You can run this directly from your Anaconda console like this:
  • python -c "import theano"
  • python -c "import chainer"
  • python -c "import cupy"
If one of them fails, identify the error message and ask the Google about it. They should all work:


A last smoke test is to get the "Hello GPU" test code from here:

Here's a copy:
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))
print (dest-a*b)

I had to change the last line of that code to have parenthesis around it like this:
print (dest-a*b)

When you run that with a command like this:
python pycuda_test.py
You should get an output of 0's that look like this:



Conclusion:
If you've gotten to here, congratulations! Your Windows 10 environment should be all setup to run Python GPU processing.

My GPU has been running for days at 90%, and my CPU is free for other work. This was a seriously miserable to figure out, but now it feels like my computer doubled the processing power!

Enjoy the awesome:

Loading