Issue
I have a program which uses mpi. To debug it I can use mpirun -np 2 xterm -e gdb myprog
.
However, xterm is buggy on my machine. I want to try gnome-terminal but I don't know what to type. I have tried:
1) mpirun -np 2 gnome-terminal -- gdb myprog
2) mpirun -np 2 gnome-terminal -- "gdb myprog"
3) mpirun -np 2 gnome-terminal -- bash -c "gdb myprog"
4) mpirun -np 2 gnome-terminal -- bash -c "gdb myprog; exec bash"
But none of these seem to work; the 1),3),4) say after run
in gdb:
It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
-------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[oleg-VirtualBox:4169] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[Inferior 1 (process 4169) exited with code 01]
In 2) the terminal says:
There was an error creating the child process for this terminal
Failed to execute child process “gdb app” (No such file or directory)
Btw I use Ubuntu 18.04.02 LTS.
What do I do wrong?
EDIT: As it turns out, it is not xterm which is buggy, it is gdb with --tui option. If your program prints something the gdb window will start displaying things incorrectly, no matter in which terminal.
Solution
The problem is that gnome-terminal hands over the requested program to a terminal server, and then exits immediately. mpirun then sees that the started program has exited, and destroys the MPI runtime environment. When the MPI program actually starts, mpirun has already exited. As far as I am aware, there is no way to make gnome-terminal wait until the given command has ended.
There is a workaround: Instead of directly starting gnome-terminal with mpirun, instead have two wrapper scripts. The first is started by mpirun. It creates a temporary file, tells gnome-terminal to start the second wrapper scripts, and then waits until the temporary file has disappeared. The second wrapper script runs the command you actually want to run, e.g. gdb myprog
, waits until it ends, then removes the temporary file. At that point the first wrapper notices that the temporary file disappeared and exits. Then mpirun can safely destroy the MPI environment.
This is probably easier to understand from the scripts themselves.
debug.sh:
#!/bin/bash
# This is run outside gnome-terminal by mpirun.
# Create a tmp file that we can wait on.
export MY_MPIRUN_TMP_FILE="$(mktemp)"
# Start the gnome-terminal. It will exit immediately.
# Call the wrapper script which removes the tmp file
# after the actual command has ended.
gnome-terminal -- ./helper.sh "$@"
# Wait for the file to disappear.
while [ -f "${MY_MPIRUN_TMP_FILE}" ] ; do
sleep 1
done
# Now exit, so mpirun can destroy the MPI environment
# and exit itself.
helper.sh
#!/bin/bash
# This is run by gnome-terminal.
# The command you actually want to run.
"$@"
# Remove the tmp file to show that the command has exited.
rm "${MY_MPIRUN_TMP_FILE}"
Run it as mpirun debug.sh gdb myproc
.
Answered By - rtoijala Answer Checked By - Katrina (WPSolving Volunteer)