HPCG, which stands for High Performance Conjugate Gradients, is a benchmark project to create a new metric for ranking HPC systems. HPCG measures the performance of basic operations including sparse matrix-vector multiplication, sparse triangular solve, vector updates, global dot products and more. The implementation is written in C++ with MPI and OpenMP support.
HPCG Reference Code
There are versions of HPCG optimized for NVIDIA GPUs or Intel XEON Phis. For this blog post, I’ll show you how to compile HPCG 3.0 Reference Code. Get the latest version with your desired optimizations.
ssh [email protected] cd /nfs wget http://www.hpcg-benchmark.org/downloads/hpcg-3.0.tar.gz tar -xvf hpcg-3.0.tar.gz cd hpcg-3.0/
The INSTALL file contains very useful instructions for compilation. First, we need to compose our Makefile.
cd setup/ ls
You will see a bunch of Makefiles with extension names that are geared towards whether you have certain libraries. I am assuming that you have the MPI library, so we’ll select Make.MPI_GCC_OMP, which stands for MPI, gcc, and Open MPI, as our base file. I’ll copy the file as a new name so that we have a fresh Makefile for adjusting to our cluster.
cp Make.MPI_GCC_OMP Make.MPI_OPENMP vim Make.MPI_OPENMP
Scroll down to the – Message Passing Library (MPI) – section. We will need to setup these variables. For my cluster:
MPdir = /nfs/mpich2
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.a
You can adjust other parameters to fit your needs, but for now, we’ll just make sure that hpcg can find MPI. Now, we should test our new Makefile.
cd ../ mkdir build_MPI_OPENMP cd build_MPI_OPENMP/
We use the configure binary with its full path and select the MPI_OPENMP extension.
/nfs/hpcg-3.0/configure MPI_OPENMP make
make will create an executable called xhpcg within /nfs/hpcg-3.0/build_MPI_OPENMP folder/bin.
hpcg.dat contains the parameters with the dimensions that you want to run xhpcg and the duration in amount of seconds. For HPCG Reference Code, we have found that a 16 16 16 matrix is the representative of the best performance. You will need to run xhpcg for greater than an hour for official results. Do not use exactly 1800 seconds if you want official results. You want to have at least 2 minutes more than 30 minutes to get official results to consider system real time.
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
16 16 16
But before you even run xhpcg for 30 minutes, test for 60 seconds first. We have found that running the benchmark at 60 seconds is a decent indicator of its performance scaled to 30 minutes We did not find any statistically significant performance increase or decrease from scaling time. You can specify the parameters available in the hpcg.dat file through the command as well. Here’s how you run HPCG with MPI.
mpirun -n 24 -f /root/nfs/hosts ./xhpcg --nx=16 --rt=60
–nx is equal to the dimension of x. The dimensions must be divisible by 8.
–rt is equal to the number of seconds of the runtime.
If you do not use ./xhpcg flags, the dimensions and duration will be taken from the values of hpcg.dat. After the benchmark finishes running in 60 seconds, you’ll find .yaml files. The .yaml file contains the results. To find the performance, scroll to the bottom of the .yaml file and see:
HPCG result is VALID with a GFLOP/s rating of: ...
The number of GFLOP/s is the indicator of your performance. Test for different dimension sizes, and once you have found a set of dimensions that you like, scale the timing to at least 32 minutes. I prefer running HPCG for an hour.
nohup mpirun -n 24 -f /root/nfs/hosts ./xhpcg --nx=16 --rt=3600 &
Hit enter, and the process will run in the background until completion or killed. See how we setup our configurations for HPCG Reference Code here:
Let me know if you have any problems setting up HPCG for the first time.
Intel® MKL Benchmarks – includes optimized HPCG for Intel XEON and Phi
We do not have a local cluster that can run HPCG for Intel XEON and Phi. Here’s what we do know though. You can grab the link for optimized HPCG for Intel XEON and Phi at the HPCG software releases page.
This optimized version of HPCG is quite different, and you must have Intel XEON and/or Phi. First, let’s talk about the Make files. The extensions of the Makefiles have some familiar names like IMPI (Intel MPI), MPICH, OPENMPI, etc. But the final underscore extensions are different. What do they stand for?
AVX = Advanced Vector Extensions
AVX2 = Advanced Vector Extensions 2
KNL = Knights Landing Architecture
MIC = Xeon Phis (Many Integrated Core)
OFFLOAD = qsub
If you run an AVX2 Makefile when your CPU doesn’t support AVX2, then you will get the following error:
[ERROR] Sorry, this benchmark can run on Intel(R) AVX2 enabled processors only: