How to compile HPL (LINPACK)

This guide will show you how to compile HPL (Linpack) and provide some tips for selecting the best input values for hpl.dat based on my experiences at the student cluster competitions.

This benchmark stresses the computers floating point operation capabilities.

Although just calculating FLOPs is not reflective of applications typically run on supercomputers, floating point is still important when precise calculations are required.

I assume a version of mpi, c/c++/fotran compilers, blas and whatever libraries you need are installed.

There are many versions of linpack for different archictures, ranging from an intel version to a CUDA version. The modifications for all versions are very similar. Below I have linked some of the different versions.

Compiling HPL

The first step is to make a copy of an existing makefile in the setup/ folder and place this in the root directory of HPL. I suggest Make.Linux_ATHLON_CBLAS, since that is the closest to generic systems. Call this file Make.[whatever] For CUDA and Intel, Make.CUDA and Make.intel64 are already created for you.

Here, you may or may not need to modify TOPdir. Typically you should specify the full path to your HPL directory.

Next is specifying the location of your MPI files and binaries. MPdir should specify the exact file path to the version of MPI you want to use, up to the root where include, lib, and bin are located.

MPinc should be the same as image above.

MPlib is similar to the image above, except libmpich.a would depend on the MPI version you installed.

** Note: if you are using a *.so instead of *.a, like the one in the image, then you need to add the library path to your environment.

vim ~/.bashrc

Add the following to the end of the file:

export $LD_LIBRARY_PATH=/path to mpi/lib:$LD_LIBRARY_PATH

Then, reload the file and now it can be found every time you log on to your system.

source ~/.bashrc


Next is linking the BLAS libraries. LAdir should specify the exact location of the BLAS binary.

LAinc should specify the include directory, if you need to include it

LAlib should specify the BLAS binary. If the blas file is a *.so file, you can follow the steps above to add it to your environment.


For HPL_OPTS, add -DHPL_DETAILED_TIMING, for better analysis of tuning the HPL.dat file.


Lastly, you can specify your compiler(cc) and compiler flags(ccflags).

Now to compile:

make arch=[whatever]

If you linked everything correctly, then in the bin/[whatever]/ directoy, there should be a .xhpl binary. Otherwise, you need to figure out which library was not linked properly.

Now navigate to bin/[whatever]/ to modify the HPL.dat file.

Modifying HPL.dat

The most important lines are:

  • Ns – size of matrix
  • Nbs – block sizes each process should operate on at a time
  • Ps and Qs – PxQ process you want to run the matrix on
Ns

N should typically be around 80-90% of the size of total memory.

N can be calculate by:
N = sqrt((Memory Size in Gbytes * 1024^3 * Number of Nodes) / Double Precison(8)) * percentage

Large Ns yield better results.

NBs

NBs is typically around 32 to 256. A small NBs is good for single node. NBs in the low 100 or 200 is good for multiple nodes. Larger NBs close to 1000 or more than 1000 is good for GPUs and accelerators.

I normally select NBs to be multiple of Ns, so there is no performance dropoff on towards the end of the calculation.

For the rest of the parameters, you can read about it here. You can find the CUDA tuning information in the CUDA HPL version. For Intel, you can find it here.

How to use MPI without NFS

You can use MPI without NFS or a shared file system! We had a situation where we couldn’t find the NFS server or client packages for arm64 for Ubuntu 16.04. We had OpenMPI version 1.10.2 installed on 2 nodes without NFS.

When you use MPI without NFS, you need to ensure that the same version of MPI is installed by every node.

Then, you have to ensure that the same data files, which include the program, hostnames file, and input files, are on every node at the same location relative to that node.

Lastly, you should double-check that you have the same SSH key and have SSHed onto every node from the node that you’re running the program at least once.

Step 1) Ensure that the same version of MPI is installed on every node.

We can check the location of where OpenMPI or any version of MPI is on every node.

which mpicc
 /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc

Make sure that this directory is consistent on each node. Now, we should check that the ~/.bashrc sets the same OpenMPI PATH folder on every node.

vi ~/.bashrc

Add the following line somewhere in the file if it is not already there.

export PATH="/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin:$PATH"

If you just added the above line, you will need to reload your ~/.bashrc. Do this for every node.

source ~/.bashrc

We can double check the version of MPI on every node by running:

mpirun --version
mpirun (Open MPI) 1.10.2
Report bugs to http://www.open-mpi.org/community/help/

Step 2) Same data files on every node.

What I would suggest is to compile the application or program on one node and send the program to all the other nodes because you must have the same exact program on every node in the same location to run mpirun without NFS properly.

For instance, let’s say on node 1 called tegra1-ubuntu, I will compile a basic MPI hello world program.

cd ~
pwd
/ubuntu/home

Now, we use git to download the mpi hello world program.

git clone https://github.com/huyle333/mpi-hello-world
cd mpi-hello-world

We will need to compile the program. My suggestion is to use the full path of mpicc to compile the program. We know the full path of mpicc already since we used which mpicc earlier.

 /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc -o mpi_hello_world mpi_hello_world.c

We will send the compiled binary, mpi_hello_world, to the same location as this node to all the other nodes.

scp mpi_hello_world [email protected]:~

Now, we will have to create a file that contains the hostnames or IP addresses of the 2 nodes.

cd ~
vi hostnames

Instead of hostnames, you can put the IP addresses.

tegra1-ubuntu
tegra2-ubuntu

We send the hostnames file to the other node.

scp hostnames [email protected]:~

Step 3) Make sure that the SSH key is the same on every node.

First, we check if we have an SSH key.

ls ~/.ssh
id_rsa id_rsa.pub known_hosts authorized_keys
cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDJxIA4WSnXiJEWZ16SrRgGKOoIS6Z2sHSZreGKDggf+aJ2unEP5vtnFq07fmKDDxG+nMipTFpzx0bMB5ysXNZaTpnEKmW76BaO7402J/bIf/HsqZBMip39d+swkXkq9NB5yCHSn7+kmzf5PKaL34X8cNLOK6I5IZrqrHj8b10JyhORJ8URxa0VltItsblCvTUrdW5grR0+O8aY3UyzaZXLIwwYBF/vrQnt/bcPSA3j6lW829pUz+XsYOsKeit7aUep+ek0q1F3SYuPUoPe7vwp8+X+TiGBQTbraynZHVEov0ZJwWojw89Xc42qGtAiW1N+NrxkuaNXvJIHpua3ZCUdfJUXLlXfhOpFWZxU7F/C32Rj6x7kz6HJrjXkTaV3UD8puh7J2oVW8sGVOoKk99KPN0bztL//sj8UDVSD8rHxl5FanCHqBICIF+ZBrqcG6v3ElNcAq/KxpVEpypZndYa+FOwXvXJfBMg5IbDzgWXy6WAuK8bI8Iavk5UeRmAOGDvJzXG/30N06lmkQKnZYhtTQ4LY10Y0lbkNSCys7ceimRB3YKbVaoSxdbTiWzhNP2a7XTTmG/b1P022HdEYsZ9+9+iwyXRINmcvT3J+8QSsLryd3u/G5kWVX9iHnFPbEt3TRCZwJLkoQXxN0OTGFveaQpjMsui6Wpu3RKdcKMzY/w== [email protected]

We make sure that the contents of id_rsa.pub are somewhere inside the authorized_keys file. authorized_keys will give SSH access to any node as long as id_rsa.pub is inside the file.

cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDJxIA4WSnXiJEWZ16SrRgGKOoIS6Z2sHSZreGKDggf+aJ2unEP5vtnFq07fmKDDxG+nMipTFpzx0bMB5ysXNZaTpnEKmW76BaO7402J/bIf/HsqZBMip39d+swkXkq9NB5yCHSn7+kmzf5PKaL34X8cNLOK6I5IZrqrHj8b10JyhORJ8URxa0VltItsblCvTUrdW5grR0+O8aY3UyzaZXLIwwYBF/vrQnt/bcPSA3j6lW829pUz+XsYOsKeit7aUep+ek0q1F3SYuPUoPe7vwp8+X+TiGBQTbraynZHVEov0ZJwWojw89Xc42qGtAiW1N+NrxkuaNXvJIHpua3ZCUdfJUXLlXfhOpFWZxU7F/C32Rj6x7kz6HJrjXkTaV3UD8puh7J2oVW8sGVOoKk99KPN0bztL//sj8UDVSD8rHxl5FanCHqBICIF+ZBrqcG6v3ElNcAq/KxpVEpypZndYa+FOwXvXJfBMg5IbDzgWXy6WAuK8bI8Iavk5UeRmAOGDvJzXG/30N06lmkQKnZYhtTQ4LY10Y0lbkNSCys7ceimRB3YKbVaoSxdbTiWzhNP2a7XTTmG/b1P022HdEYsZ9+9+iwyXRINmcvT3J+8QSsLryd3u/G5kWVX9iHnFPbEt3TRCZwJLkoQXxN0OTGFveaQpjMsui6Wpu3RKdcKMzY/w== [email protected]

Make sure that the 2nd node has the ~/.ssh directory.

ssh [email protected]
ls ~/.ssh
known_hosts
exit

Back on node 1, we will send the SSH public and private key and authorized_keys file to node 2.

cd ~/.ssh
scp id_rsa id_rsa.pub authorized_keys [email protected]:~/.ssh

Step 4) Run the mpi program with full paths.

To make sure that the mpi program runs properly without NFS, make sure that you run the program with the full path of the MPI binaries and hostnames.

[email protected]:~$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun --hostfile /ubuntu/home/hostnames -n 8 /ubuntu/home/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 2 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 3 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 1 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 0 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 7 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 5 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 6 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 4 out of 8 processors

You should see that the hello world is processed on both nodes! If you have input files used by your program, make sure that they are also in the same location on both nodes.

Leave a comment if you have any questions.

How to Fix OpenMPI ORTE Error: unknown option “–hnp-topo-sig”

We encountered an ORTE bug with the error message, Error: unknown option "--hnp-topo-sig", while using OpenMPI version 1.10.2 for arm64 on Ubuntu 14.04 server. More specifically, we ran the following command using 2 nodes with MPI:

mpirun --hostfile /nfs/hostnames -n 4 /nfs/mpi-hello-world/mpi_hello_world

ORTE errors can happen because of a variety of different things, but it’s usually because your mpirun is not the same version as your mpi compiler. Even if you think that you only have one MPI version, you may in fact have multiple versions of MPI.

How we fixed our problem?

First, we checked where our MPI C compiler was located.

which mpicc
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc

Nothing odd here. The location of the MPI C compiler was where we expected. But, we had to check if we actually had multiple versions of MPI.

mpicc (press tab twice)
mpicc mpicc.openmpi

We saw a second version of the MPI C compiler on our machine! If you have the ORTE error and you indeed have two versions of MPI, you should use full paths when using mpirun. Let’s see if it works.

Test a small MPI program on 2 nodes

First, we change directory into our NFS shared folder.

cd /nfs
git clone https://github.com/huyle333/mpi-hello-world
cd mpi-hello-world

We want to compile the mpi-hello-world program with the full path of the MPI compiler that we are expecting.

/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc -o mpi_hello_world mpi_hello_world.c

mpi_hello_world is the created binary. Now, we test if we can use mpirun with 1 node. Use the full path of the mpirun command.

which mpirun
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun
 [email protected]:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun -n 4 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 0 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 1 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 2 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 3 out of 4 processors

Using mpirun on 1 node seems to work fine. Okay, now let’s try 2 nodes. /nfs/hostnames contains 2 IP addresses of the nodes that I want to use.

[email protected]:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun --hostfile /nfs/hostnames -n 8 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 2 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 3 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 1 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 0 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 7 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 5 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 6 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 4 out of 8 processors

Eureka! It works because we can see that hello world is triggered on both tegra1-ubuntu and tegra2-ubuntu.

Now for your actual program, use full paths to make sure that you are not running a different version of mpirun, and it should work.

Running MPI – Common MPI Troubleshooting Problems

In this post, I’ll list some common troubleshooting problems that I have experienced with MPI libraries after I compiled MPICH on my cluster for the first time. The following assumes that:

  1. You have at least 2 nodes as part of your cluster.
  2. You have MPI compiled inside a NFS (Network File System), a shared folder.

I will divide the common problems into separate sections. For my first installation of an MPI library, I was using http://www.mpich.org/downloads/

 

MPI Paths on each Node

I placed my MPICH library and binaries inside a shared folder, /nfs/mpich3. All the machines connected via a switch or directly through Ethernet or Infiniband should have the binary and library paths configured correctly to run MPI binaries like mpirun. To configure your MPI paths, edit the ~/.bashrc.

vim ~/.bashrc
export PATH=/nfs/mpich3/bin:$PATH
export LD_LIBRARY_PATH=/nfs/mpich3/lib:$LD_LIBRARY_PATH

Save and exit. Then, to load the new ~/.bashrc file:

source ~/.bashrc

 

How Do You Actually Run an MPI?

First, you should create a hosts file with all the IPs that you want to run MPI on. All of these machines should have the MPI path configured properly.

vim /nfs/hosts

Inside this hosts file, you will include all the IPs including the machine that you are on:

To find the IP addr, use the following command:

ip addr show

Save and quit. Now, you’ll be ready to run your MPI command. The first thing to know about MPI is that the main binary is mpirun, and you specify how many cores you want to run the binary on. To determine how many cores are on each machine, run the following:

less /proc/cpuinfo

Scroll down and count how many times you see a processor. The number of processors represents the number of cores on the machine. To start, let’s say that I have a ./mpi_hello_world binary and 6 machines with 4 processors on each.

mpirun -f /nfs/hosts -n 24 ./mpi_hello_world

With the above command, I would have run the mpi hello world program across 24 cores on the IPs listed in the /nfs/hosts file. For sample MPI programs, use git to download this popular repository of samples:

git clone https://github.com/wesleykendall/mpitutorial

Check the tutorials folder inside mpitutorial, compile some of the programs, and try to use mpirun on the binaries!

 

Firewall Blocking MPI

After you feel that you have configured MPI properly, you may encounter an error where MPI cannot communicate with other nodes. We could check to see if we have the appropriate ports open, but the easy way is to drop the firewall for quick testing.

On CentOS, for every connected machine, run the following:

systemctl stop firewalld

Now, try your MPI command again!

 

Password Prompt

If when you run an MPI command, the machines ask you for the password, then, you haven’t set up your SSH keys properly. You should use SSH keys with the key being authorized on every node.

To generate an SSH key, first you should make sure that you have the ~/.ssh directory.

mkdir ~/.ssh

Check if you have an SSH key already.

ls ~/.ssh

If you see any id_rsa.pub or .pub files, that file is your SSH key. If not, then, you can generate a standard SSH key with:

ssh-keygen -t rsa -b 4096 -C "[email protected]"

After making the SSH key, you can add the contents of ~/.ssh/id_rsa.pub into a new file called ~/.ssh/authorized_keys on every connected machine. If you already have this file, add to the bottom of it the contents of your ~/.ssh/id_rsa.pub file on a new line. With this ~/.ssh/id_rsa.pub in the ~/.ssh/authorized_keys, you should be able to SSH into every connected machine without the password prompt if they contain the authorized_keys verifying access for your pub file.

 

Host Key Verification Failed

Still, you might get a host verification problem! Let’s say that I have a file with all my IPs of every connected machine, /nfs/hosts, to be used with mpirun. When I try the command:

mpirun -f /nfs/hosts -n 4 ./mpi_hello_world
Host key verification failed.
Host key verification failed.
Host key verification failed.

SSH keys are set up. Firewall is shutdown. This problem happens because on the machine that you run MPI, you must ssh at least once to all the nodes in the /nfs/hosts file that are connected to the machine that you are running the mpi command. SSH into each of the machines whose IP is in the hosts file at least once because the main machine must have this history.

 

SSH to each node at least once

MPI might not work if you have not SSHed to each node at least once. Let’s say that my /nfs/hosts file contains 6 IP addresses. Let’s say that from node 1, I want to run:

mpirun -n 24 -f /nfs/hosts ./mpi_hello_world

Before I run the above command, I must SSH into nodes 2 – 6 from node 1 at least once to update ~/.ssh/known_hosts.

 

Unable to get host address

You might get the following error:

[proxy:0:[email protected]] HYDU_sock_connect (../../utils/sock/sock.c:224): unable to get host address for buhpc1 (1)
[proxy:0:[email protected]] main (../../pm/pmiserv/pmip.c:415): unable to connect to server buhpc1 at port 46951 (check for firewalls!)

In this scenario, I am using buhpc1 and buhpc4 for MPI. But wait. We already shut off the firewall on both machines. Well, this problem happens when you don’t have access to a working DNS server. A configuration might be wrong or you cannot access any public DNS server at all.

To fix the problem, you need to edit /etc/hosts on all the machines that you want to run MPI on.

vim /etc/hosts

Originally, it will look like this:

Then, you will add a couple of lines:

If you don’t have subdomains, then you can just leave the file like this:

After saving /etc/hosts on both machines with these settings, you should be able to run MPI as expected.