Issue
I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.
Running ssh localhost 'touch foobar &'
creates a file called foobar
as expected:
[bob@server ~]$ ssh localhost 'touch foobar &'
[bob@server ~]$ ls foobar
foobar
However running the same command but with the -t
option to force pseudo-tty allocation fails to create foobar
:
[bob@server ~]$ ssh -t localhost 'touch foobar &'
Connection to localhost closed.
[bob@server ~]$ echo $?
0
[bob@server ~]$ ls foobar
ls: cannot access foobar: No such file or directory
My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:
[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1'
Connection to localhost closed.
[bob@pidora ~]$ ls foobar
foobar
If anyone has a definitive explanation I would be very interested to hear it. Thanks.
Solution
Oh, that's a good one.
This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c
, and the effect of &
in input commands.
The answer assumes you're familiar with how job control works in UNIX; if you're not, here's a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word'
would place the processes running cat(1)
, sort(1)
and grep(1)
in the same process group). bash
is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3)
. A signal sent to a process group is delivered to every process in that group.
If the concept of process groups and job control is completely new to you, I think you'll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).
That being said, let's see what is happening here. We have to fit together every piece of the puzzle.
In both cases, the ssh remote side invokes bash(1)
with -c
. The -c
flag causes bash(1)
to run as a non-interactive shell. From the manpage:
An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.
Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t
.
However, the use of &
has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:
If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0. Commands separated by a ; are executed sequentially; the shell waits for each command to terminate in turn. The return status is the exit status of the last command executed.
So, in both cases, bash will not wait for command execution, and touch(1)
will be executed in the same process group as bash(1)
.
Now, consider what happens when a session leader exits. Quoting from setpgid(2)
manpage:
If a session has a controlling terminal, and the CLOCAL flag for that terminal is not set, and a terminal hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, then a SIGHUP signal will also be sent to each process in the foreground process group of the controlling terminal.
(Emphasis mine)
When you don't use -t
When you don't use -t
, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()'d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)
), there is no SIGHUP
sent to the process group, because bash wasn't a session leader (and there is no controlling terminal). So everything works.
When you use -t
-t
forces PTY allocation, which means that the ssh remote side will call setsid(2)
, allocate a pseudo-terminal + fork a new process with forkpty(3)
, connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1)
. forkpty(3)
opens the PTY slave side in the forked process that will become bash; since there's no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.
Then the same thing happens again: touch(1)
is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &
, when it exits, SIGHUP
is delivered to the process group and touch(1)
dies prematurely.
About nohup
nohup(1)
doesn't work here because there is still a race condition. If bash(1)
terminates before nohup(1)
has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)
A possible fix
Forcefully re-enabling job control fixes it. In bash, you do that with set -m
. This works:
ssh -t localhost 'set -m ; touch foobar &'
Or force bash to wait for touch(1)
to complete:
ssh -t localhost 'touch foobar & wait `pgrep touch`'
Answered By - Filipe Gonçalves