We encountered an ORTE bug with the error message, Error: unknown option "--hnp-topo-sig"
, while using OpenMPI version 1.10.2 for arm64 on Ubuntu 14.04 server. More specifically, we ran the following command using 2 nodes with MPI:
mpirun --hostfile /nfs/hostnames -n 4 /nfs/mpi-hello-world/mpi_hello_world
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | [tegra2-ubuntu:04159] Error: unknown option "--hnp-topo-sig" -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- |
ORTE errors can happen because of a variety of different things, but it’s usually because your mpirun
is not the same version as your mpi compiler. Even if you think that you only have one MPI version, you may in fact have multiple versions of MPI.
How we fixed our problem?
First, we checked where our MPI C compiler was located.
which mpicc
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc
Nothing odd here. The location of the MPI C compiler was where we expected. But, we had to check if we actually had multiple versions of MPI.
mpicc (press tab twice)
mpicc mpicc.openmpi
We saw a second version of the MPI C compiler on our machine! If you have the ORTE error and you indeed have two versions of MPI, you should use full paths when using mpirun
. Let’s see if it works.
Test a small MPI program on 2 nodes
First, we change directory into our NFS shared folder.
cd /nfs git clone https://github.com/huyle333/mpi-hello-world cd mpi-hello-world
We want to compile the mpi-hello-world program with the full path of the MPI compiler that we are expecting.
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpicc -o mpi_hello_world mpi_hello_world.c
mpi_hello_world
is the created binary. Now, we test if we can use mpirun
with 1 node. Use the full path of the mpirun
command.
which mpirun
/opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun
[email protected]:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun -n 4 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 0 out of 4 processors Hello world from processor tegra1-ubuntu, rank 1 out of 4 processors Hello world from processor tegra1-ubuntu, rank 2 out of 4 processors Hello world from processor tegra1-ubuntu, rank 3 out of 4 processors
Using mpirun on 1 node seems to work fine. Okay, now let’s try 2 nodes. /nfs/hostnames
contains 2 IP addresses of the nodes that I want to use.
[email protected]:~/nfs/mpi-hello-world$ /opt/arm/openmpi-1.10.2_Cortex-A57_Ubuntu-14.04_aarch64-linux/bin/mpirun --hostfile /nfs/hostnames -n 8 /nfs/mpi-hello-world/mpi_hello_world
Hello world from processor tegra1-ubuntu, rank 2 out of 8 processors Hello world from processor tegra1-ubuntu, rank 3 out of 8 processors Hello world from processor tegra1-ubuntu, rank 1 out of 8 processors Hello world from processor tegra1-ubuntu, rank 0 out of 8 processors Hello world from processor tegra2-ubuntu, rank 7 out of 8 processors Hello world from processor tegra2-ubuntu, rank 5 out of 8 processors Hello world from processor tegra2-ubuntu, rank 6 out of 8 processors Hello world from processor tegra2-ubuntu, rank 4 out of 8 processors
Eureka! It works because we can see that hello world is triggered on both tegra1-ubuntu and tegra2-ubuntu.
Now for your actual program, use full paths to make sure that you are not running a different version of mpirun, and it should work.