Install notes for TACC/Stampede

Install notes for building our python3 stack on TACC/Stampede, using the intel compiler suite. Many thanks to Yaakoub El Khamra at TACC for help in sorting out the python3 build and numpy linking against a fast MKL BLAS.

On Stampede, we can in principle either install with a gcc/mpvapich2/fftw3 stack with OpenBLAS, or with an intel/mvapich2/fftw3 stack with MKL. Mpvaich2 is causing problems for us, and this appears to be a known issue with mvapich2/1.9, so for now we must use the intel/mvapich2/fftw3 stack, which has mvapich2/2.0b. The intel stack should also, in principle, allow us to explore auto-offloading with the Xenon MIC hardware accelerators. Current gcc instructions can be found under NASA Pleiades.


Here is my current build environment (from running module list)

  1. TACC-paths

  2. Linux

  3. cluster-paths

  4. TACC

  5. cluster

  6. intel/

  7. mvapich2/2.0b


To get here from a gcc default do the following:

module unload mkl module swap gcc intel/

In the intel compiler stack, we need to use mvapich2/2.0b, which then implies intel/ Right now, TACC has not built fftw3 for this stack, so we’ll be doing our own FFTW build.

See the Stampede user guide for more details. If you would like to always auto-load the same modules at startup, build your desired module configuration and then run:

module save

For ease in structuring the build, for now we’ll define:

export BUILD_HOME=$HOME/build_intel

Python stack

Building Python3

Create ~\build_intel and then proceed with downloading and installing Python-3.3:

cd ~/build_intel
tar -xzf Python-3.3.3.tgz
cd Python-3.3.3

# make sure you have the python patch, put it in Python-3.3.3
tar xvf python_intel_patch.tar

./configure --prefix=$BUILD_HOME \
                     CC=icc CFLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                     CXX=icpc CPPFLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                     F90=ifort F90FLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                     --enable-shared LDFLAGS="-lpthread" \
                     --with-cxx-main=icpc --with-system-ffi

make install

To successfully build python3, the key is replacing the file ffi64.c, which is done automatically by downloading and unpacking this crude patch python_intel_patch.tar in your Python-3.3.3 directory. Unpack it in Python-3.3.3 (tar xvf python_intel_patch.tar line above) and it will overwrite ffi64.c. If you forget to do this, you’ll see a warning/error that _ctypes couldn’t be built. This is important.

Here we are building everything in ~/build_intel; you can do it whereever, but adjust things appropriately in the above instructions. The build proceeeds quickly (few minutes).

Installing FFTW3

We need to build our own FFTW3, under intel 14 and mvapich2/2.0b:

 tar -xzf fftw-3.3.3.tar.gz
 cd fftw-3.3.3

./configure --prefix=$BUILD_HOME \
                      CC=mpicc \
                      CXX=mpicxx \
                      F77=mpif90 \
                      MPICC=mpicc MPICXX=mpicxx \
                      --enable-shared \
                      --enable-mpi --enable-openmp --enable-threads
 make install

It’s critical that you use mpicc as the C-compiler, etc. Otherwise the libmpich libraries are not being correctly linked into and dedalus failes on fftw import.

Updating shell settings

At this point, python3 is installed in ~/build_intel/bin/. Add this to your path and confirm (currently there is no python3 in the default path, so doing a which python3 will fail if you haven’t added ~/build_intel/bin).

On Stampede, login shells (interactive connections via ssh) source only ~/.bash_profile, ~/.bash_login or ~/.profile, in that order, and do not source ~/.bashrc. Meanwhile non-login shells only launch ~/.bashrc (see Stampede user guide).

In the bash shell, add the following to .bashrc:

export PATH=~/build_intel/bin:$PATH
export LD_LIBRARY_PATH=~/build_intel/lib:$LD_LIBRARY_PATH

and the following to .profile:

if [ -f ~/.bashrc ]; then . ~/.bashrc; fi

(from bash reference manual) to obtain the same behaviour in both shell types.

Installing pip

We’ll use pip to install our python library depdencies. Instructions on doing this are available here and summarized below. First download and install setup tools:

cd ~/build

Then install pip:

wget --no-check-certificate
python3 --cert /etc/ssl/certs/ca-bundle.crt

Now edit ~/.pip/pip.conf:

cert = /etc/ssl/certs/ca-bundle.crt

You will now have pip3 and pip installed in ~/build/bin. You might try doing pip -V to confirm that pip is built against python 3.3. We will use pip3 throughout this documentation to remain compatible with systems (e.g., Mac OS) where multiple versions of python coexist.

Installing nose

Nose is useful for unit testing, especially in checking our numpy build:

pip3 install nose

Numpy and BLAS libraries

Building numpy against MKL

Now, acquire numpy (1.8.0):

cd ~/build_intel
tar -xvf numpy-1.8.0.tar.gz
cd numpy-1.8.0
tar xvf numpy_inte_patch.tar

This last step saves you from needing to hand edit two files in numpy/distutils; these are and fcompiler/ I’ve built a crude patch, numpy_intel_patch.tar which can be auto-deployed by within the numpy-1.8.0 directory by the instructions above. This will unpack and overwrite:


We’ll now need to make sure that numpy is building against the MKL libraries. Start by making a site.cfg file:

cp site.cfg.example site.cfg
emacs -nw site.cfg

Edit site.cfg in the [mkl] section; modify the library directory so that it correctly point to TACC’s $MKLROOT/lib/intel64/. With the modules loaded above, this looks like:

library_dirs = /opt/apps/intel/13/composer_xe_2013_sp1.1.106/mkl/lib/intel64
include_dirs = /opt/apps/intel/13/composer_xe_2013_sp1.1.106/mkl/include
mkl_libs = mkl_rt
lapack_libs =

These are based on intels instructions for compiling numpy with ifort and they seem to work so far.

Then proceed with:

python3 config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install

This will config, build and install numpy.

Test numpy install

Test that things worked with this executable script numpy_test_full. You can do this full-auto by doing:

chmod +x numpy_test_full

or do so manually by launching python3 and then doing:

import numpy as np

If you’ve installed nose (with pip3 install nose), we can further test our numpy build with:


We fail np.test() with two failures, while np.test('full') has 3 failures and 19 errors. But we do successfully link against the fast BLAS libraries (look for FAST BLAS output, and fast dot product time).


We should check what impact these failed tests have on our results.

Python library stack

After numpy has been built (see links above) we will proceed with the rest of our python stack. Right now, all of these need to be installed in each existing virtualenv instance (e.g., openblas, mkl, etc.).

For now, skip the venv process.

Installing Scipy

Scipy is easier, because it just gets its config from numpy. Download an install in your appropriate ~/venv/INSTANCE directory:

tar -xvf scipy-0.13.2.tar.gz
cd scipy-0.13.2

Then run

python3 config --compiler=intelem --fcompiler=intelem build_clib \
                                        --compiler=intelem --fcompiler=intelem build_ext \
                                        --compiler=intelem --fcompiler=intelem install

Installing mpi4py

This should just be pip installed:

pip3 install mpi4py==2.0.0


If we use use

pip3 install mpi4py

then stampede tries to pull version 0.6.0 of mpi4py. Hence the explicit version pull above.

Installing cython

This should just be pip installed:

pip3 install -v

The Feb 11, 2014 update to cython (0.20.1) seems to have broken (at least with intel compilers).:

pip3 install cython

Installing matplotlib

This should just be pip installed:

pip3 install -v


If we use use

pip3 install matplotlib

then stampede tries to pull version 1.1.1 of matplotlib. Hence the explicit version pull above.

Installing sympy

Do this with a regular pip install:

pip3 install sympy

Installing HDF5 with parallel support

The new analysis package brings HDF5 file writing capbaility. This needs to be compiled with support for parallel (mpi) I/O:

tar xvf hdf5-1.8.12.tar
cd hdf5-1.8.12
./configure --prefix=$BUILD_HOME \
                    CC=mpicc \
                    CXX=mpicxx \
                    F77=mpif90 \
                    MPICC=mpicc MPICXX=mpicxx \
                    --enable-shared --enable-parallel
make install

Installing h5py

Next, install h5py. We wish for full HDF5 parallel goodness, so we can do parallel file access during both simulations and post analysis as well. This will require building directly from source (see Parallel HDF5 in h5py for further details). Here we go:

git clone
cd h5py
export CC=mpicc
python3 configure --mpi
python3 build
python3 install

After this install, h5py shows up as an .egg in site-packages, but it looks like we pass the suggested test from Parallel HDF5 in h5py.

Installing h5py with collectives

We’ve been exploring the use of collectives for faster parallel file writing. To build that version of the h5py library:

git clone
cd h5py
git checkout mpi_collective
export CC=mpicc
python3 configure --mpi
python3 build
python3 install

To enable collective outputs within dedalus, edit dedalus2/data/ and replace:

# Assemble nonconstant subspace
subshape, subslices, subdata = self.get_subspace(out)
dset = task_group.create_dataset(name=name, shape=subshape, dtype=dtype)
dset[subslices] = subdata


# Assemble nonconstant subspace
subshape, subslices, subdata = self.get_subspace(out)
dset = task_group.create_dataset(name=name, shape=subshape, dtype=dtype)
with dset.collective:
    dset[subslices] = subdata

Alternatively, you can see this same edit in some of the forks (Lecoanet, Brown).


There are some serious problems with this right now; in particular, there seems to be an issue with empty arrays causing h5py to hang. Troubleshooting is ongoing.


With the modules set as above, set:

export BUILD_HOME=$HOME/build_intel
export CC=mpicc

Then change into your root dedalus directory and run:

python build_ext --inplace

Our new stack (intel/14, mvapich2/2.0b) builds to completion and runs test problems successfully. We have good scaling in limited early tests.

Running Dedalus on Stampede

Source the appropriate virtualenv:

source ~/venv/openblas/bin/activate


source ~/venv/mkl/bin/activate

grab an interactive dev node with idev. Play.

Skipped libraries

Installing freetype2

Freetype is necessary for matplotlib

cd ~/build
tar -xvf freetype-2.5.2.tar.gz
cd freetype-2.5.2
./configure --prefix=$HOME/build
make install


Skipping for now

Installing libpng

May need this for matplotlib?:

cd ~/build
./configure --prefix=$HOME/build
make install


Skipping for now


We may wish to deploy UMFPACK for sparse matrix solves. Keaton is starting to look at this now. If we do, both numpy and scipy will require UMFPACK, so we should build it before proceeding with those builds.

UMFPACK requires AMD (another package by the same group, not processor) and SuiteSparse_config, too.

If we need UMFPACK, we can try installing it from suite-sparse as in the Mac install. Here are links to UMFPACK docs and Suite-sparse


We’ll check and update this later. (1/9/14)

All I want for christmas is suitesparse

Well, maybe :) Let’s give it a try, and lets grab the whole library:

tar xvf SuiteSparse.tar.gz

<edit SuiteSparse_config/>


Notes from the original successful build process:

Just got a direct call from Yaakoub. Very, very helpful. Here’s the quick rundown.

He got _ctypes to work by editing the following file:

vim /work/00364/tg456434/yye00/src/Python-3.3.3/Modules/_ctypes/libffi/src/x86/ffi64.c

Do build with intel 14 use mvapich2/2.0b Will need to do our own build of fftw3

set mpicc as c compiler rather than icc, same for CXX, FC and others, when configuring python. should help with mpi4py.

in mpi4py, can edit mpi.cfg (non-pip install).

Keep Yaakoub updated with direct e-mail on progress.

Also, Yaakoub is spear-heading TACCs efforts in doing auto-offload to Xenon Phi.

Beware of disk quotas if you’re trying many builds; I hit 5GB pretty fast and blew my matplotlib install due to quota limits :)

Installing virtualenv (skipped)

In order to test multiple numpys and scipys (and really, their underlying BLAS libraries), we will use virtualenv:

pip3 install virtualenv

Next, construct a virtualenv to hold all of your python modules. We suggest doing this in your home directory:

mkdir ~/venv



With help from Yaakoub, we now build _ctypes successfully.

Also, the mpicc build is much, much slower than icc. Interesting. And we crashed out. Here’s what we tried with mpicc:

./configure --prefix=$BUILD_HOME \
                 CC=mpicc CFLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                 CXX=mpicxx CPPFLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                 F90=mpif90 F90FLAGS="-mkl -O3 -xHost -fPIC -ipo" \
                 --enable-shared LDFLAGS="-lpthread" \
                 --with-cxx-main=mpicxx --with-system-ffi