I've tried the following trivial program:
#include <mpi.h>
int
main (int argc, char **argv)
{
MPI_Init (&argc, &argv);
MPI_Finalize ();
}
Compile it:
riff /tmp> mpic++.lam lamtest.cc -o lamtest
/usr/lib/lam/include/mpi2cxx/functions_inln.h: In function 'void PMPI::Pcontrol(int, ...)':
/usr/lib/lam/include/mpi2cxx/functions_inln.h:249: warning: cannot pass objects of non-POD type 'struct va_list' through '...'; call will abort at runtime
Ok, pretty weird error but IIRC it emitted it in older LAM-versions as
well.
Running without a lamd fails normally:
riff /tmp> ./lamtest
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host riff.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
Exit 215
So let's do it correctly:
riff /tmp> lamboot
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
riff /tmp> ./lam
lam-thimo@riff/ lamtest*
riff /tmp> ./lamtest
[hangs]
Calling "lamnodes" in another shell hangs too, it doesn't if the test
program wasn't started.
I've attached a full strace of lamtest. If I read it correctly the
program opens a socket
chdir("/tmp/lam-thimo@riff") = 0
socket(PF_FILE, SOCK_STREAM, 0) = 3
connect(3, {sa_family=AF_FILE, path="lam-kernel-socket"}, 19) = 0
chdir("/tmp")
communicates for quite a while until
writev(3, [{"\r\0\0@F\270W?\0\0\0\0\17\0\0@\0\0\0\0\0\0\0\0\0\0\0\0"..., 72}, {ptrace: umoven: Input/output error
and then filehandle 3 seems broken.
Cheers
Thimo