How to compile HPL (LINPACK)

This guide will show you how to compile HPL (Linpack) and provide some tips for selecting the best input values for hpl.dat based on my experiences at the student cluster competitions.

This benchmark stresses the computers floating point operation capabilities.

Although just calculating FLOPs is not reflective of applications typically run on supercomputers, floating point is still important when precise calculations are required.

I assume a version of mpi, c/c++/fotran compilers, blas and whatever libraries you need are installed.

There are many versions of linpack for different archictures, ranging from an intel version to a CUDA version. The modifications for all versions are very similar. Below I have linked some of the different versions.

Compiling HPL

The first step is to make a copy of an existing makefile in the setup/ folder and place this in the root directory of HPL. I suggest Make.Linux_ATHLON_CBLAS, since that is the closest to generic systems. Call this file Make.[whatever] For CUDA and Intel, Make.CUDA and Make.intel64 are already created for you.

Here, you may or may not need to modify TOPdir. Typically you should specify the full path to your HPL directory.

Next is specifying the location of your MPI files and binaries. MPdir should specify the exact file path to the version of MPI you want to use, up to the root where include, lib, and bin are located.

MPinc should be the same as image above.

MPlib is similar to the image above, except libmpich.a would depend on the MPI version you installed.

** Note: if you are using a *.so instead of *.a, like the one in the image, then you need to add the library path to your environment.

vim ~/.bashrc

Add the following to the end of the file:

export $LD_LIBRARY_PATH=/path to mpi/lib:$LD_LIBRARY_PATH

Then, reload the file and now it can be found every time you log on to your system.

source ~/.bashrc

Next is linking the BLAS libraries. LAdir should specify the exact location of the BLAS binary.

LAinc should specify the include directory, if you need to include it

LAlib should specify the BLAS binary. If the blas file is a *.so file, you can follow the steps above to add it to your environment.

For HPL_OPTS, add -DHPL_DETAILED_TIMING, for better analysis of tuning the HPL.dat file.

Lastly, you can specify your compiler(cc) and compiler flags(ccflags).

Now to compile:

make arch=[whatever]

If you linked everything correctly, then in the bin/[whatever]/ directoy, there should be a .xhpl binary. Otherwise, you need to figure out which library was not linked properly.

Now navigate to bin/[whatever]/ to modify the HPL.dat file.

Modifying HPL.dat

The most important lines are:

  • Ns – size of matrix
  • Nbs – block sizes each process should operate on at a time
  • Ps and Qs – PxQ process you want to run the matrix on

N should typically be around 80-90% of the size of total memory.

N can be calculate by:
N = sqrt((Memory Size in Gbytes * 1024^3 * Number of Nodes) / Double Precison(8)) * percentage

Large Ns yield better results.


NBs is typically around 32 to 256. A small NBs is good for single node. NBs in the low 100 or 200 is good for multiple nodes. Larger NBs close to 1000 or more than 1000 is good for GPUs and accelerators.

I normally select NBs to be multiple of Ns, so there is no performance dropoff on towards the end of the calculation.

For the rest of the parameters, you can read about it here. You can find the CUDA tuning information in the CUDA HPL version. For Intel, you can find it here.

How to Setup the Intel Compilers on a Cluster

Intel compilers like icc or icl are very useful for any cluster with Intel processors. They’ve been known to produce very efficient numerical code. If you are still a student, you can grab the student Intel Parallel Studio XE Cluster Edition, which includes Fortran and C/C++ for free for a year. Here’s our experience. If you need more information, definitely check out the official Intel Parallel Studio XE Cluster Edition guide.


You should have the GCC C and C++ compilers on your machine. I am using CentOS 7. You will need to install GCC C and C++ compilers on all the machines.

yum install gcc
yum install gcc-c++


Getting the Intel compilers and MPI libraries

I’m going to grab the student  Intel® Parallel Studio XE Cluster Edition for Linux, which lasts for a year. First thing to do is to join the Intel Developer Zone at the following link:

Fill your information and choose an Intel User ID to create. Now, you’ll have an account, but you’ll need to be a student to get the Intel compilers for free at:

Click on Linux underneath Intel Parallel Studio XE Cluster Edition. Check the items on the next page and fill your e-mail before submitting. After submitting, you’ll receive an e-mail labeled “Thank You for Your Interest in the Intell® Software Development Products.”

The e-mail contains a product serial number that should last a year. The e-mail also contains a DOWNLOAD button that you should click.

After visiting the link, you’ll be brought to Intel® Parallel Studio XE Cluster Edition for Linux*. I prefer the Full Offline Installer Package (3994 MB). If you choose the Full Offline Installer Package, you will need to stay on that link and acquire your license file. In the red text, you’ll see the following sentence:

"If you need to acquire your license file now, for offline installation, please click here to provide your host information and download your license file."

Once you click the here link, you’ll be brought into a Sign In page to download your license file. After signing in, you’ll see your licenses that you can download. Download your license file or e-mail it to yourself. If you download the license, it should be a lic file.

At this point, you should have downloaded two files. parallel_studio_xe_2016_update2.tgz contains the zipped archive of the Intel Parallel Studio XE Cluster Edition, and NCOM….lic is your license.


You should upload these two files to the shared folder of your cluster. My shared folder is /nfs, so I’ll be sending those two files to my /nfs folder.

scp parallel_studio_xe_2016_update2.tgz NCOM...lic [email protected]

Now, you can extract the tgz file by running:

ssh [email protected]
cd /nfs
tar -xvf parallel_studio_xe_2016_update2.tgz

We will put the license file as Licenses in /root.

mv NCOM....lic /root/Licenses



Now, we will set up the Intel compilers and MPI libraries.

cd parallel_studio_xe_2016_update2

It should say Initializing, please wait… until a text GUI pops up for installation. Type the number option that installs the installation.

First, we need to activate. Hit 3 and press Enter.

Step 2 of 7 | License agreement
[Press space to continue, 'q' to quit.]

After pressing space a bunch of times, you’ll reach the end of the license.

Type 'accept' to continue or 'decline' to go back to the previous menu:

Type “accept.”

Please type a selection or press "Enter" to accept default choice [1]: Please type your serial number (the format is XXXX-XXXXXXXX):

In another terminal, check the serial number, which will be inside /root/Licenses.



Hit enter and the correct number options for the Intel compilers and libraries to install. You’ll see the installation of Intel MPI Benchmarks, Libraries, C++ Compiler, Fortran Compiler, and more. Using the script is the sure way to make that all the Intel libraries are installed correctly, but if you really only want specific libraries, then you’ll have to select which ones you want to install inside the rpm/ folder. The full installation may take 15 minutes or more.

Press "Enter" key to continue:
Press "Enter" key to quit:

As for the final step, the paths for Intel may not be set up automatically. I am using CentOS 7 64 bit, so I’ll have to setup the environment for Intel 64 bit. We’ll have to adjust our ~/.bashrc.

vim ~/.bashrc

Add to the end of the file the following:

Save and quit. Note: your directories may be slightly different based on the version of Intel Parallel Studio XE Cluster Edition you installed. Adjust those directories accordingly by searching whether the directories match.

source ~/.bashrc

Now, you should be able to access and use the Intel compilers as expected.



When you first run your mpirun command with the Intel Parallel Studio XE Cluster Edition, you may receive an error about RLIMIT_MEMLOCK being too small.

mpirun -n 8 -f /nfs/hosts2 ./xhpcg --nx=16 --rt=60

The problem is that memory lock is set statically, and it’s too small. For every machine that you want to use MPI, we should set memory lock to unlimited.

ulimit -l unlimited
ulimit -l

If the second command says unlimited, we’ve set memory lock to unlimited. Now, we have to make sure that it’s unlimited on every startup instance.

vi /etc/security/limits.conf

Go to the bottom of the file and add the following:

*            hard   memlock           unlimited
*            soft   memlock           unlimited

Save and quit. Now, if you run the MPI command again, you should not encounter any problems.


Missing Hydra Files

You may come across an error where you can have missing hydra files. When you run mpirun, you may get:

bash: /usr/local/bin/hydra_pmi_proxy: No such file or directory

How I fixed the problem was: I downloaded MPICH, a different MPI library, and compiled it with these instructions. hydra binaries for MPICH should work with Intel MPI because they’re both the same process manager.  I copied MPICH’s hydra binaries to a directory that was also added to the ~/.bashrc PATH.

cp /nfs/mpich2/bin/hydra_persist /nfs/mpich2/bin/hydra_nameserver /nfs/mpich2/bin/hydra_pmi_proxy /usr/local/bin

Then, I added /usr/local/bin to the ~/.bashrc PATH.

vim ~/.bashrc

Add the following line:

export PATH=/usr/local/bin:$PATH

Save the file. And then reload ~/.bashrc.

source ~/.bashrc

Do this for all the nodes where you are missing hydra_pmi_proxy. Afterwards, if you run mpirun again, it should work!