After upgrading to F39, MPI programs are terminated with error messages

I recently upgraded my working machines to Fedora 39. My machines are a little bit old, but they have performed their tasks perfectly. However, after upgrading, when I run my MPI programs (using both MPICH and OpenMPI), I encounter error messages, and the program is terminated. Since the same program runs perfectly on the newest machine (unfortunately, not mine), I suspect that MPICH and OpenMPI no longer support older CPUs. So I wonder how I can run my MPI programs on my old machines.

My machines have Intel(R) Core™ i5-3470 CPU @ 3.20GHz.

When I run my MPI test code using mpirun, I got the following error messages:

  • MPICH:

mpi_test_hello.out:27998 terminated with signal 4 at PC=7fa0aa5fcbb2 SP=7fff69b12c00. Backtrace:
/lib64/libfabric.so.1(+0x5fcbb2)[0x7fa0aa5fcbb2]
/lib64/libfabric.so.1(+0x5e676f)[0x7fa0aa5e676f]
/lib64/libfabric.so.1(+0x5ff2ef)[0x7fa0aa5ff2ef]
/lib64/libfabric.so.1(+0x6052d0)[0x7fa0aa6052d0]
/lib64/libfabric.so.1(+0x679d40)[0x7fa0aa679d40]
/lib64/libfabric.so.1(+0x5d15f0)[0x7fa0aa5d15f0]
/usr/lib64/mpich/lib/libmpi.so.12(+0x2c851b)[0x7fa0aacc851b]
/usr/lib64/mpich/lib/libmpi.so.12(+0x2caceb)[0x7fa0aaccaceb]
/usr/lib64/mpich/lib/libmpi.so.12(+0x3f93da)[0x7fa0aadf93da]
/usr/lib64/mpich/lib/libmpi.so.12(+0x45ff07)[0x7fa0aae5ff07]
/usr/lib64/mpich/lib/libmpi.so.12(MPI_Init+0xdd)[0x7fa0aaad8d8d]
mpi_test_hello.out[0x4011c5]
/lib64/libc.so.6(+0x2814a)[0x7fa0aa84614a]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7fa0aa84620b]
mpi_test_hello.out[0x4010a5]

  • OpneMPI:

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 0 with PID 0 on node cnet2 exited on signal 4 (Illegal instruction).

Is there anyone who has an idea to resolve this problem?

1 Like

Signal 4 is illegal instruction.
The code you are running will need to be compiled to not use the “newer” instructions. Where did libfabric come from?

2 Likes

I installed all my packages and libraries using dnf.
Here is some more detailed information of the installed packages:
mpich.x86_64 : 4.1.2-3.fc39 @ fedora-modular
mpich-devel.x86_64 : 4.1.2-3.fc39 @ fedora-modular
openmpi.x86_64 : 4.1.5-8.fc39 @ updates
openmpi-devel.x86_64 : 4.1.5-8.fc39 @ updates

1 Like

There is a bug report for this: 2263220 – Illegal instruction in /lib64/libfabric.so.1

2 Likes

Thanks, Jerry. There was an update today, and now it works without any problem.

1 Like

I believe the fedora-modular repo was completely removed from fedora 39. Those appear to be hold-overs from an earlier version and may be the cause of the problem.