For this blog post, I will be building on top of the Virtual Machine Clustering and NFS Server Setup in part 1. As a result, before you follow the steps of this post, you will want to have a similar setup like I have instructed in part 1:
MPI stands for Message Passing Interface. MPI isn’t your average networking library. It’s optimized for performance, takes the fastest transport for running parallel programs across machines, and is a usable protocol library implemented on nearly all operating systems.
Virtual Machine Information
With the steps from part 1, I have two virtual machines connected together under an internal network with one being the NFS Server and the other being the NFS Client.
Setting up SSH Keys
Currently, we can access from machine #1 to machine #2 through SSH. On machine #1,
ssh email@example.com exit
The problem is that we will always receive a password prompt whenever we SSH across the internal network to the other machine. We want to set up SSH keys to SSH from each machine without passwords.
Let’s make sure that we’re on machine #1. Before we create the SSH key, create the ~/.ssh folder.
ssh-keygen -t rsa -b 4096 -C "firstname.lastname@example.org"
You can press Enter to leave the next three prompts as default.
Enter file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
Enter passphrase (empty for no passphrase): [Type a passphrase] Enter same passphrase again: [Type passphrase again]
Your identification has been saved in /Users/you/.ssh/id_rsa. Your public key has been saved in /Users/you/.ssh/id_rsa.pub. The key fingerprint is: 01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db email@example.com
We will copy the public key, id_rsa.pub, to authorized_keys to enable this key for access to machine #1.
cp id_rsa.pub authorized_keys
Now, we should send the private key, id_rsa, and public key, id_rsa.pub, from machine #1 to machine #2. We use a command called scp for copying files over machines.
scp ~/.ssh/id_rsa ~/.ssh/id_rsa.pub firstname.lastname@example.org:
On machine #2, we have received the private key and public key. We need to make the ~/.ssh directory on machine #2.
Now, we copy the id_rsa and id_rsa.pub to the ~/.ssh folder.
cp id_rsa id_rsa.pub ~/.ssh
We want to copy id_rsa.pub to the authorized_keys to allow machine #1 to be able to SSH to machine #2 without a password.
cd ~/.ssh cp id_rsa.pub authorized_keys
We should be able to ssh from machine #1 to machine #2 without a password and vice versa.
On machine #1: ssh email@example.com
On machine #2: ssh firstname.lastname@example.org
First, we’ll need a library called wget. Wget will allow us the ability to download links to the machine. We will download wget on machine #1.
yum install wget -y
We will now download the source of mpich, which is an implementation of MPI. We download inside the shared folder.
cd /nfs wget http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz
After we have downloaded mpich, we will install the C compilers, Fortran compiler, and kernel build tools on machine #1 and machine #2.
yum install gcc gcc-c++ gcc-fortran kernel-devel -y (on both machines) vi ~/.bashrc
To extract the mpich downloaded tar.gz compressed file, we use the following command. The command will create a mpich-3.1.4 folder with all the contents of the extracted compressed file.
tar -xvf mpich-3.1.4.tar.gz
We will make a directory where all the compiled binaries and libraries of mpich will go.
Now, we will configure the settings of mpich for installation.
cd /nfs/mpich-3.1.4 ./configure --prefix=/nfs/mpich3
Afterwards, we run a couple of commands for finishing the compiling and installation of mpich.
make make install
If we cd /nfs/mpich3, we will see folders containing the binaries and libraries of mpich. If we cd /nfs/mpich3/bin, we can see mpi binaries like mpirun.
Currently, we won’t be able to use mpirun from anywhere on the machine. We need to change the ~/.bashrc file on machine #1 and machine #2 to globalize the mpi commands.
On both machines:
At the bottom of ~/.bashrc, add the following two lines:
export PATH=/nfs/mpich3/bin:$PATH export LD_LIBRARY_PATH="/nfs/mpich3/lib:$LD_LIBRARY_PATH"
PATH is used for bin folders, and LD_LIBRARY_PATH is used for lib folders. To reload the ~/.bashrc, type the following command on both machines:
Using MPI binaries: Running MPI
Let’s make a folder where we’ll create space for projects.
mkdir /nfs/projects cd /nfs/projects
We need to create a hosts file that contains the IP addresses of all the IP’s that we want MPI to run.
MPI relies on ports for TCP and UDP packet communication. We will need to stop the firewalld for the process to hop between machines.
systemctl stop firewalld
We can test mpirun on a Linux command. The -f flag selects the host file that determines the IP addresses where MPI will decide to run the program. The -n flag determines the number of CPU cores that you want the program to run on. You may have to be on the client machine, in this case, 10.0.1.3, for the command to run without any errors.
mpirun -f hosts -n 4 echo "hello world"
The mpirun command prints the following if you have the machines in each other’s known_hosts records.
mpirun has run echo “hello world” 4 times because we chose 4 cores. When mpirun is used on a serial (meant for one machine) command, it will tend to run the command as many times as the number of cores selected as separate instances.
top is a useful Linux command to see what processes are running on the machine. If we check machine #2:
And if we run the mpirun command on machine #1 again:
mpirun -f hosts -n 4 echo "hello world"
We will see that top on machine #2 will not see any changes. The process does not jump over. Why? The reason is because when you use a serial Linux command like echo or ls or pwd, these commands are built for a single machine use. MPI does not see the need to jump this command over to another machine. MPI tries the fastest transport, which could be running everything on the one machine. MPI has the choice between the IP addresses specified in the hosts file. As a result, echo “hello world” is run only on machine #1.
Press q to quit top.
On the next blog post, we will write a basic MPI C or Python program and use mpi binaries to show you that the process is run across machines on programs designed to run in parallel. If you use mpirun on programs designed to run serially, the processes will likely run only on that machine because MPI will not see the need to make the jump to another machine, and the program in itself will not be programmed to do so.