We encountered an ORTE bug with the error message, Error: unknown option "--hnp-topo-sig", while using OpenMPI version 1.10.2 for arm64 on Ubuntu 14.04 server. More specifically, we ran the following command using 2 nodes with MPI:

mpirun --hostfile /nfs/hostnames -n 4 /nfs/mpi-hello-world/mpi_hello_world

ORTE errors can happen because of a variety of different things, but it’s usually because your mpirun is not the same version as your mpi compiler. Even if you think that you only have one MPI version, you may in fact have multiple versions of MPI.

How we fixed our problem?

First, we checked where our MPI C compiler was located.

which mpicc
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc

Nothing odd here. The location of the MPI C compiler was where we expected. But, we had to check if we actually had multiple versions of MPI.

mpicc (press tab twice)
mpicc mpicc.openmpi

We saw a second version of the MPI C compiler on our machine! If you have the ORTE error and you indeed have two versions of MPI, you should use full paths when using mpirun. Let’s see if it works.

Test a small MPI program on 2 nodes

First, we change directory into our NFS shared folder.

cd /nfs
git clone https://github.com/huyle333/mpi-hello-world
cd mpi-hello-world

We want to compile the mpi-hello-world program with the full path of the MPI compiler that we are expecting.

/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc -o mpi_hello_world mpi_hello_world.c

mpi_hello_world is the created binary. Now, we test if we can use mpirun with 1 node. Use the full path of the mpirun command.

which mpirun
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun
 ubuntu@tegra1-ubuntu:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun -n 4 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 0 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 1 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 2 out of 4 processors
Hello world from processor tegra1-ubuntu, rank 3 out of 4 processors

Using mpirun on 1 node seems to work fine. Okay, now let’s try 2 nodes. /nfs/hostnames contains 2 IP addresses of the nodes that I want to use.

ubuntu@tegra1-ubuntu:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun --hostfile /nfs/hostnames -n 8 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 2 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 3 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 1 out of 8 processors
Hello world from processor tegra1-ubuntu, rank 0 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 7 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 5 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 6 out of 8 processors
Hello world from processor tegra2-ubuntu, rank 4 out of 8 processors

Eureka! It works because we can see that hello world is triggered on both tegra1-ubuntu and tegra2-ubuntu.

Now for your actual program, use full paths to make sure that you are not running a different version of mpirun, and it should work.