#1000446 liblam4: Running parallel program yields "Unix errno: 14 Bad address"

Package:
liblam4
Source:
lam
Description:
Shared libraries used by LAM parallel programs
Submitter:
Ralf Schlatterbeck
Date:
2021-11-23 10:48:03 UTC
Severity:
important
#1000446#5
Date:
2021-11-23 10:35:28 UTC
From:
To:
I'm trying to run a parallel program built with lam4-dev.

The pre-flight check with

recon ~/.mpi-lam-machinefile
works (and reports "Woo hoo! [...]")

The machinefile:
opi9 cpu=4
opi2 cpu=4
opi3 cpu=4
opi4 cpu=4
opi5 cpu=4

Starting the cluster works, too:
% lamboot ~/.mpi-lam-machinefile

LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University


I'm running the program with either:

mpiexec C /usr/local/bin/folded_antenna -r 28 optimize
or with an explicit number of processes:
mpiexec -c 2 /usr/local/bin/folded_antenna -r 28 optimize

I'm getting multiple message of the form:
-----------------------------------------------------------------------------
It seems that some error has occurred during MPI_INIT.  This will
cause your process to abort.  These kinds of errors are usually
system-related, such as running out of disk space, running out of
memory, or something more serious such as data not being passed
between processes properly.  That is, you should not be seeing this
error message; if you are, something is likely Very Wrong with your
system.  :-(

Perhaps this Unix error message will help:

        Unix errno: 14
        Bad address
----------------------------------------------------------------------------- My hope is that this is a simple usage error ... Note that the same program works fine when compiled & linked with mpich from debian.