This guide will show you how to compile HPL (Linpack) and provide some tips for selecting the best input values for hpl.dat based on my experiences at the student cluster competitions.

This benchmark stresses the computers floating point operation capabilities.

Although just calculating FLOPs is not reflective of applications typically run on supercomputers, floating point is still important when precise calculations are required.

I assume a version of mpi, c/c++/fotran compilers, blas and whatever libraries you need are installed.

There are many versions of linpack for different archictures, ranging from an intel version to a CUDA version. The modifications for all versions are very similar. Below I have linked some of the different versions.

Compiling HPL

The first step is to make a copy of an existing makefile in the setup/ folder and place this in the root directory of HPL. I suggest Make.Linux_ATHLON_CBLAS, since that is the closest to generic systems. Call this file Make.[whatever] For CUDA and Intel, Make.CUDA and Make.intel64 are already created for you.

Here, you may or may not need to modify TOPdir. Typically you should specify the full path to your HPL directory.

Next is specifying the location of your MPI files and binaries. MPdir should specify the exact file path to the version of MPI you want to use, up to the root where include, lib, and bin are located.

MPinc should be the same as image above.

MPlib is similar to the image above, except libmpich.a would depend on the MPI version you installed.

** Note: if you are using a *.so instead of *.a, like the one in the image, then you need to add the library path to your environment.

vim ~/.bashrc

Add the following to the end of the file:

export $LD_LIBRARY_PATH=/path to mpi/lib:$LD_LIBRARY_PATH

Then, reload the file and now it can be found every time you log on to your system.

source ~/.bashrc


Next is linking the BLAS libraries. LAdir should specify the exact location of the BLAS binary.

LAinc should specify the include directory, if you need to include it

LAlib should specify the BLAS binary. If the blas file is a *.so file, you can follow the steps above to add it to your environment.


For HPL_OPTS, add -DHPL_DETAILED_TIMING, for better analysis of tuning the HPL.dat file.


Lastly, you can specify your compiler(cc) and compiler flags(ccflags).

Now to compile:

make arch=[whatever]

If you linked everything correctly, then in the bin/[whatever]/ directoy, there should be a .xhpl binary. Otherwise, you need to figure out which library was not linked properly.

Now navigate to bin/[whatever]/ to modify the HPL.dat file.

Modifying HPL.dat

The most important lines are:

  • Ns – size of matrix
  • Nbs – block sizes each process should operate on at a time
  • Ps and Qs – PxQ process you want to run the matrix on
Ns

N should typically be around 80-90% of the size of total memory.

N can be calculate by:
N = sqrt((Memory Size in Gbytes * 1024^3 * Number of Nodes) / Double Precison(8)) * percentage

Large Ns yield better results.

NBs

NBs is typically around 32 to 256. A small NBs is good for single node. NBs in the low 100 or 200 is good for multiple nodes. Larger NBs close to 1000 or more than 1000 is good for GPUs and accelerators.

I normally select NBs to be multiple of Ns, so there is no performance dropoff on towards the end of the calculation.

For the rest of the parameters, you can read about it here. You can find the CUDA tuning information in the CUDA HPL version. For Intel, you can find it here.