How to compile and install RELION-3.1 on CentOS 8.1

2022-05-10 update: this post is now mostly outdated. The general strategy of installing in /opt and using module files is still relevant, but the specific details and versions mentioned below are outdated. In addition, CentOS transitioned to an update schedule that is out of sync with the Nvidia driver. I now use Rocky Linux for cryoEM computing, and I also no longer recommend installing the Nvidia driver with the “runfile” installer: it is much easier to use the RPM package provided by Nvidia (the CUDA toolkit, however, is best installed in /opt using the “runfile”). You can find up-to-date installation instructions for RELION here.


The lab I work at recently acquired a GPU workstation on which I had to install RELION, a program for processing cryoEM data. Since this is not a straightforward procedure, I took some notes in case I need to do this again in the future. I decided to also post these notes here, in case they can help anyone else.

These notes apply to RELION-3.1 and CentOS 8.1. Usual disclaimers apply: backup your data before modifying your system, don’t run commands you don’t fully understand (especially so if they require to be run with sudo), follow these directions at your own risk, and I am in no way responsible if you mess up your system in the course of following these directions. Also, I make no commitment to keep these notes up-to-date with future versions of RELION or CentOS, and I don’t have time to offer individual help. So please don’t email me questions, email the CCPEM list instead for questions specific to RELION, or seek help from your distribution’s specific channels for general Linux questions.

Conventions

All commands listed below should be run as a normal user, not as root. Commands that require administrator permissions (like installing packages with the system package manager) are prepended with sudo, which assumes the user account you run these commands from is in the administrator group (wheel).

Programs not installed with the system package manager will be installed in their own directory /opt/<program>-X.Y.Z, in which <program> is the program’s name in lowercase and X.Y.Z is its version number. This will only work if the user performing the installation has write permission to /opt, which is safe to do because /opt is a location reserved for programs that are not part of the system distribution (there is nothing to break there, because the system doesn’t put any of its files there). This has several advantages over making a package for the distribution or installing in /usr/local:

  • the biggest advantage is that we can have several versions of the same program installed at the same time: we can only have one relion binary under /usr/local, but we can have several /opt/relion-X.Y.Z with different version numbers happily living next to each other (sometimes, we need to revisit old results obtained with an earlier version of the program not necessarily compatible with the current version, so this is a true practical advantage);
  • installing a program is easy: in most cases, ./configure --prefix=/opt/<program>-X.Y.Z ; make ; make install will work just fine (as indicated above, without any risk of messing up the base system if run as a regular user), which is easier than making an RPM or DEB or what-have-you package compliant with all of your distribution’s packaging rules;
  • it is easy to check what takes up storage space, with du -sh /opt/*;
  • uninstalling a program is as easy as rm -r /opt/<program>-X.Y.Z, since all of a program’s files are under a single directory, instead of scattered across subdirectories under /usr/local.

Now, /usr/local/bin is in users’ PATH by default, while this is not the case for arbitrary directories under /opt. How do we make our custom-built programs accessible from the shell with minimal configuration for our users? Obviously, having every user edit their ~/.bashrc file is not a viable option: they would have to do that every time they want to change which version of a program they use, and this is error prone. The solution is to use the Environment Modules system, installed by default on CentOS 8.1. This allows us to write modulefiles that will correctly set up environment variables for each specific program. We will store these modulefiles under /opt/modulefiles, and append this path to $MODULESHOME/init/.modulespath so the module commands can use our custom modulefiles. The file $MODULESHOME/init/.modulespath initially looks something like this:

# This file defines the initial setup for the modulefiles search path
# Each line containing one or multiple paths delimited by ':' will be
# added to the MODULEPATH environment variable.
/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles

Edit it with sudo vi $MODULESHOME/init/.modulespath to add /opt/modulefiles to the $MODULEPATH variable. The file should now look like this:

# This file defines the initial setup for the modulefiles search path
# Each line containing one or multiple paths delimited by ':' will be
# added to the MODULEPATH environment variable.
/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/opt/modulefiles

Our users can now list all available modules on the system (module avail) and easily set their shell environment to use what they need (module load <program/X.Y.Z>).

Requirements

To compile RELION, we need to install development tools:

sudo dnf group install "Development Tools"
sudo dnf install cmake

If you wonder what comes with the “Development Tools” group, you can find out with the following command:

dnf group info "Development Tools"

And we need the following libraries:

sudo dnf install fftw-devel fltk-devel libX11-devel libtiff-devel libpng-devel freetype-devel

The following may not be necessary for RELION, but some libraries are in the PowerTools repository, so we might need to activate it:

sudo dnf config-manager --set-enabled PowerTools

And other required packages are in EPEL:

sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

Choose a version of CUDA

Compiling and running RELION requires the CUDA Toolkit and libraries. The easiest way to compile RELION-3.1 on CentOS 8.1 is to use CUDA 10.2, which supports the version of GCC (8.3.1) that comes with CentOS 8.1. However, RELION can use external programs for motion correction and CTF estimation, which are not open source and depend on different versions of CUDA. The trade-offs, at the time I am writing this, go as follows:

  1. Compile RELION with CUDA 10.2. This is the easiest way to go and has the least number of pre-requisites. For motion correction, you will be able to use either MotionCor2 (not open source, but version 1.3.1 ships a binary compiled with CUDA 10.2) or RELION’s own motion correction program (not GPU-accelerated). For CTF estimation, you will be limited to CTFFIND4 (not GPU-accelerated), since Gctf version 1.18 does not ship a binary compiled with CUDA 10.2 (and likely won’t because it seems not maintained anymore; it is also not open source, preventing one from compiling it with one’s preferred version of CUDA).
  2. Compile RELION with CUDA 9.2. This is a bit more difficult because CUDA 9.2 is not compatible with GCC 8.3.1, which is the default compiler on CentOS 8.1: for CUDA 9.2 to work, you will therefore have to install a version of GCC earlier than version 7. The advantage of using CUDA 9.2 is that you can then have all programs working in the same environment: MotionCor2 (version 1.3.1 ships a binary built with CUDA 9.2), Gctf (version 1.18 also ships a binary built with CUDA 9.2), RELION’s own motion correction program (independent of CUDA because not GPU-accelerated) and CTFFIND4 (also independent of CUDA because not GPU-accelerated). Even though this is a bit more work for the system administrator, this is an easier setup for the users since they will be able to use any combination of these programs in a single shell.
  3. Compile two copies of RELION, one with CUDA 9.2 and one with CUDA 10.2. This is as easy (or as difficult…) as both options above, but possible when using the environment modules system. This way, one can use the RELION compiled with CUDA 10.2 for everything, and only use the one compiled with CUDA 9.2 to run Gctf. This requires changing the environment to choose which version of RELION and CUDA should be used at run time, but this is easy with the environment modules system.

I wanted to go with option 1, but the CTFFIND4 binaries downloaded from its website don’t run on CentOS 8.1 (they crash with a segmentation fault, and inspecting them with file ctffind reports they were compiled for a Linux 2.6 kernel, which is several versions older than the kernel version in CentOS 8.1). Since this program is open source, I tried to compile it. It compiled fine and runs long enough to interactively pass all parameter prompts, but then crashes with a segmentation fault. This may be due to the fact that I compiled it with GCC, as a somewhat old email on the CCPEM list suggests that it should be compiled with ICC. So I chose option 3.

Install CUDA 10.2

The following commands will download the installer from Nvidia’s website and install only the CUDA Toolkit (not the driver):

cd ~/Downloads
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
chmod +x cuda_10.2.89_440.33.01_linux.run
./cuda_10.2.89_440.33.01_linux.run --silent --toolkit --toolkitpath=/opt/cuda-10.2

Then place this module file in /opt/modulefiles/cuda/10.2:

#%Module1.0
  
proc ModulesHelp { } {
global dotversion
puts stderr "\tCUDA Libraries and Toolkit, version 10.2"
}

module-whatis "CUDA Libraries and Toolkit. Documentation: https://docs.nvidia.com/cuda/archive/10.2/"

conflict cuda

set program cuda
set version 10.2
set prefix /opt/$program-$version

prepend-path PATH $prefix/bin
prepend-path LD_LIBRARY_PATH $prefix/lib64
prepend-path CPATH $prefix/include
prepend-path C_INCLUDE_PATH $prefix/include
prepend-path CPLUS_INCLUDE_PATH $prefix/include
prepend-path INCLUDE $prefix/include
setenv CUDA_HOME $prefix

Install CUDA 9.2

For this, we first need to install a version of GCC earlier than version 7. I chose to go for the latest version in the 6.x series, which is 6.5.0. Compiling GCC requires the following system packages:

sudo dnf install gmp-devel mpfr-devel libmpc-devel

And here is how to download GCC 6.5.0’s source, compile and install it (you can of course choose a different mirror closer to you):

cd ~/Downloads
wget ftp://ftp.uvsq.fr/pub/gcc/releases/gcc-6.5.0/gcc-6.5.0.tar.gz
tar -xf gcc-6.5.0.tar.gz
cd gcc-6.5.0
mkdir build
cd build
../configure --prefix=/opt/gcc-6.5.0 --disable-multilib
make
make install-strip

Finally, save this module file as /opt/modulefiles/gcc/6.5.0:

#%Module1.0
  
proc ModulesHelp { } {
global dotversion
puts stderr "\tGNU Compiler Collection, version 6.5.0"
}

module-whatis "GNU Compiler Collection. Documentation: https://www.gnu.org/software/gcc/"

set program gcc
set version 6.5.0
set prefix /opt/$program-$version

prepend-path PATH $prefix/bin
prepend-path LD_LIBRARY_PATH $prefix/lib
prepend-path LD_LIBRARY_PATH $prefix/lib64
prepend-path CPATH $prefix/include
prepend-path C_INCLUDE_PATH $prefix/include
prepend-path CPLUS_INCLUDE_PATH $prefix/include

# Take higher priority than system CC
setenv CC $prefix/bin/gcc
setenv CXX $prefix/bin/g++

We can then install CUDA 9.2:

cd ~/Downloads
wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux
chmod +x cuda_9.2.148_396.37_linux
./cuda_9.2.148_396.37_linux --silent --toolkit --toolkitpath=/opt/cuda-9.2

Let’s also install the patch:

module purge
module load gcc/6.5.0
cd ~/Downloads
wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/patches/1/cuda_9.2.148.1_linux
chmod +x cuda_9.2.148.1_linux
./cuda_9.2.148.1_linux --silent --accept-eula --installdir=/opt/cuda-9.2

And finally, save this module file as /opt/modulefiles/cuda/9.2:

#%Module1.0
  
proc ModulesHelp { } {
global dotversion
puts stderr "\tCUDA Libraries and Toolkit, version 9.2"
}

module-whatis "CUDA Libraries and Toolkit. Documentation: https://docs.nvidia.com/cuda/archive/9.2/"

conflict cuda

set program cuda
set version 9.2
set prefix /opt/$program-$version

prepend-path PATH $prefix/bin
prepend-path LD_LIBRARY_PATH $prefix/lib64
prepend-path CPATH $prefix/include
prepend-path C_INCLUDE_PATH $prefix/include
prepend-path CPLUS_INCLUDE_PATH $prefix/include
prepend-path INCLUDE $prefix/include
setenv CUDA_HOME $prefix

Install OpenMPI 3.1.6

RELION also requires OpenMPI. I first tried to use the RPM package:

sudo dnf install openmpi-devel

I managed to compile RELION with this OpenMPI, which happens to be version 4.0.1, but then I got segmentation faults at run time, making this build of RELION essentially useless (parallelization with OpenMPI is used for pretty much everything in RELION). I investigated this error and found that cmake had picked up an OpenMPI version 3.1:

-- Found MPI_C: /usr/lib64/openmpi/lib/libmpi.so (found version "3.1")

I still don’t understand how this is possible, since the RPM package did not install anything from version 3.1:

$ ls -l /usr/lib64/openmpi/lib/ | grep libmpi.so
lrwxrwxrwx. 1 root root      17 Nov 21  2019 libmpi.so -> libmpi.so.40.20.1
lrwxrwxrwx. 1 root root      17 Nov 21  2019 libmpi.so.40 -> libmpi.so.40.20.1
-rwxr-xr-x. 1 root root 2422280 Nov 21  2019 libmpi.so.40.20.1

But then the mpirun used at run time was definitely 4.0.1, and that caused problems:

$ mpirun --version
mpirun (Open MPI) 4.0.1

I asked about this problem on the CCPEM list, and from the answer I got I understand that using different versions of OpenMPI during compilation and at run time definitely cause this kind of problem. But what is still not clear to me is whether RELION would have worked fine with OpenMPI 4.0.1, had it been correctly compiled and run with this version, or whether it is only compatible with OpenMPI 3.x versions. So, I installed OpenMPI 3.1.6 (newest version in the 3.x series) from source:

cd ~/Downloads
wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.6.tar.bz2
tar -xf openmpi-3.1.6.tar.bz2
cd openmpi-3.1.6
mkdir build
cd build
../configure --prefix=/opt/openmpi-3.1.6
make
make install-strip

I then adapted the module file provided by the RPM package to make one for this version of OpenMPI. I stored this module file in /opt/modulefiles/openmpi/3.1.6:

#%Module 1.0
#
#  OpenMPI module for use with 'environment-modules' package:
#
conflict                mpi
prepend-path            PATH            /opt/openmpi-3.1.6/bin
prepend-path            LD_LIBRARY_PATH /opt/openmpi-3.1.6/lib
prepend-path            PKG_CONFIG_PATH /opt/openmpi-3.1.6/lib/pkgconfig
prepend-path            MANPATH         /opt/openmpi-3.1.6/share/man
setenv                  MPI_BIN         /opt/openmpi-1.3.6/bin
setenv                  MPI_SYSCONFIG   /opt/openmpi-3.1.6/etc
setenv                  MPI_FORTRAN_MOD_DIR     /usr/lib64/gfortran/modules/openmpi
setenv                  MPI_INCLUDE     /opt/openmpi/3.1.6/include
setenv                  MPI_LIB         /opt/openmpi-3.1.6/lib
setenv                  MPI_MAN         /opt/openmpi-3.1.6/share/man
setenv                  MPI_PYTHON_SITEARCH     /usr/lib64/python3.6/site-packages/openmpi
setenv                  MPI_PYTHON2_SITEARCH    /usr/lib64/python3.6/site-packages/openmpi
setenv                  MPI_PYTHON3_SITEARCH    /usr/lib64/python3.6/site-packages/openmpi
setenv                  MPI_COMPILER    openmpi-x86_64
setenv                  MPI_SUFFIX      _openmpi
setenv                  MPI_HOME        /opt/openmpi-3.1.6

Install RELION

After many detours, we finally have all we need to compile and install RELION. We get the source code by cloning the git repository the first time:

mkdir ~/software
cd ~/software
git clone https://github.com/3dem/relion

Next time you need to update it, pull the new changes:

cd ~/software/relion
git pull

Compile RELION with CUDA 10.2

The following commands will configure and compile RELION-3.1 (at the time I wrote these notes, it was commit 5997001f75) with CUDA 10.2. Change the value of -DCUDA_ARCH= to adapt to your GPU: this is the “compute capability” listed here, without the dot (choose the highest one supported by both your GPU and the version of CUDA you’re using; if you don’t specify this option, cmake seems to default to a very low compute capability, which is compatible with more different combinations of GPU and CUDA version but means you won’t take full advantage of your specific GPU):

cd ~/software/relion
git checkout ver3.1
mkdir build_cuda-10.2
cd build_cuda-10.2
module purge
module load openmpi/3.1.6 cuda/10.2
cmake -DCMAKE_INSTALL_PREFIX=/opt/relion-3.1_cuda-10.2 -DCUDA_ARCH=75 ..
make

To install it, make sure the destination directory exists:

mkdir /opt/relion-3.1_cuda-10.2

And then run:

make install

Finally, save this module file as /opt/modulefiles/relion/3.1_cuda-10.2. There are more environment variables you can set based on your specific system, you can read about it in RELION’s documentation.

#%Module1.0

proc ModulesHelp { } {
global dotversion
puts stderr "\tRELION, version 3.1 (CUDA 10.2)"
}

module-whatis "2D classification, 3D classification and 3D refinement. Documentation: https://www3.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page"

module load openmpi/3.1.6 cuda/10.2 motioncor2/1.3.1 ctffind/4.1.14

conflict relion

prereq openmpi/3.1.6
prereq cuda/10.2
prereq motioncor2
prereq ctffind

set program relion
set version 3.1_cuda-10.2
set prefix /opt/$program-$version

# Where to find other programs
setenv RELION_MOTIONCOR2_EXECUTABLE MotionCor2_v1.3.1-Cuda102
setenv RELION_CTFFIND_EXECUTABLE ctffind
setenv RELION_RESMAP_EXECUTABLE ResMap
setenv RELION_PDFVIEWER_EXECUTABLE evince
setenv RELION_QSUB_TEMPLATE $prefix/bin/qsub.csh

# MPI and threads settings
# Ask for confirmation if users try to submit local jobs with more than 9 MPI processes. Rationale: 9 MPIs means 1 coordinator + 4GPUs x 2 workers.
setenv RELION_WARNING_LOCAL_MPI 9
# It doesn't help to overbook the GPUs too much. 13 MPIs means 1 coordinator + 4GPUs x 3 workers.
# But some programs like CTFFIND and RELION's MotionCor run on CPUs, so the hard limit on MPI processes should be half the CPU cores.
setenv RELION_MPI_MAX 40
setenv RELION_ERROR_LOCAL_MPI 41

# Shell to launch other programs from
setenv RELION_SHELL bash

# Scratch location
setenv RELION_SCRATCH_DIR /scratch/

prepend-path PATH $prefix/bin
prepend-path LD_LIBRARY_PATH $prefix/lib

Compile RELION with CUDA 9.2

The following commands will configure and compile RELION-3.1 (at the time I wrote these notes, it was commit 5997001f75) with CUDA 9.2. Change the value of -DCUDA_ARCH= to adapt to your GPU: this is the “compute capability” listed here, without the dot (choose the highest one supported by both your GPU and the version of CUDA you’re using; if you don’t specify this option, cmake seems to default to a very low compute capability, which is compatible with more different combinations of GPU and CUDA version but means you won’t take full advantage of your specific GPU):

cd ~/software/relion
git checkout ver3.1
mkdir build_cuda-9.2
cd build_cuda-9.2
module purge
module load openmpi/3.1.6 cuda/9.2 gcc/6.5.0
cmake -DCMAKE_INSTALL_PREFIX=/opt/relion-3.1_cuda-9.2 -DCUDA_ARCH=72 ..
make

To install it, make sure the destination directory exists:

mkdir /opt/relion-3.1_cuda-9.2

And then run:

make install

Finally, save this module file as /opt/modulefiles/relion/3.1_cuda-9.2. There are more environment variables you can set based on your specific system, you can read about it in RELION’s documentation.

#%Module1.0
  
proc ModulesHelp { } {
global dotversion
puts stderr "\tRELION, version 3.1 (CUDA 9.2)"
}

module-whatis "2D classification, 3D classification and 3D refinement. Documentation: https://www3.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page"

module load openmpi/3.1.6 cuda/9.2 motioncor2/1.3.1 gctf/1.18b2

conflict relion

prereq openmpi/3.1.6
prereq cuda/9.2
prereq motioncor2
prereq gctf
#prereq ctffind

set program relion
set version 3.1_cuda-9.2
set prefix /opt/$program-$version

# Where to find other programs
setenv RELION_MOTIONCOR2_EXECUTABLE MotionCor2_v1.3.1-Cuda92
setenv RELION_GCTF_EXECUTABLE Gctf_v1.18_b2_sm70_cu9.2
#setenv RELION_CTFFIND_EXECUTABLE ctffind
setenv RELION_RESMAP_EXECUTABLE ResMap
setenv RELION_PDFVIEWER_EXECUTABLE evince
setenv RELION_QSUB_TEMPLATE $prefix/bin/qsub.csh

# MPI and threads settings
# Ask for confirmation if users try to submit local jobs with more than 9 MPI processes. Rationale: 9 MPIs means 1 coordinator + 4GPUs x 2 workers.
setenv RELION_WARNING_LOCAL_MPI 9
# It doesn't help to overbook the GPUs too much. 13 MPIs means 1 coordinator + 4GPUs x 3 workers.
# But some programs like CTFFIND, Gctf, MotionCor2 and RELION's MotionCor run on CPUs, so the hard limit on MPI processes should be half the CPU cores.
setenv RELION_MPI_MAX 40
setenv RELION_ERROR_LOCAL_MPI 41

# Shell to launch other programs from
setenv RELION_SHELL bash

# Scratch location
setenv RELION_SCRATCH_DIR /scratch/

prepend-path PATH $prefix/bin
prepend-path LD_LIBRARY_PATH $prefix/lib

Running RELION

Now, the command module avail should list all these new module files (I did not show the module files for MotionCor2, Gctf and CTFFIND; those simply contain the same header up to module-whatis, followed by a single prepend-path directive to indicate where to find the binary):

$ module avail
---/opt/modulefiles ---
cuda/9.2
cuda/10.2
ctffind/4.1.14
gcc/6.5.0
gctf/1.18b2
motioncor2/1.3.1
openmpi/3.1.6
relion/3.1_cuda-9.2
relion/3.1_cuda-10.2  

You can now get one version of RELION or the other on your path like so:

$ module list
No Modulefiles Currently Loaded.
$ which relion
/usr/bin/which: no relion in (/home/guillaume/.local/bin:/home/guillaume/bin:/opt/miniconda3/condabin:/home/guillaume/.local/bin:/home/guillaume/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
$ module load relion/3.1_cuda-9.2
$ module list
Currently Loaded Modulefiles:
 1) openmpi/3.1.6   2) cuda/9.2   3) motioncor2/1.3.1   4) gctf/1.18b2   5) relion/3.1_cuda-9.2
$ which relion
/opt/relion-3.1_cuda-9.2/bin/relion
$ module purge
$ module list
No Modulefiles Currently Loaded.
$ module load relion/3.1_cuda-10.2
$ module list
Currently Loaded Modulefiles:
 1) openmpi/3.1.6   2) cuda/10.2   3) motioncor2/1.3.1   4) ctffind/4.1.14   5) relion/3.1_cuda-10.2
$ which relion
/opt/relion-3.1_cuda-10.2/bin/relion