Thursday, October 6, 2022

[SOLVED] SIGTERM not trapped while command is running, but SIGINT is

Issue

I'm building some CI pipelines, and part of it is a bash wrapper script around a docker container running ansible commands. The trouble I'm having is that on job abort the container keeps running, which is potentially dangerous.

What I have currently is:

#!/bin/bash

CONTAINER=ansible

function kill_container() {
  echo "$0 caught $1" >&2
  docker kill ${CONTAINER}
  exit $?
}
trap 'kill_container SIGINT' SIGINT
trap 'kill_container SIGTERM' SIGTERM

function ansible_base() {
  docker run -d --rm --name ${CONTAINER} someorg/ansible:latest $@
  docker logs --follow ${CONTAINER}
}

ansible_base $@

and my local test is simply ./run.sh sleep 30.

For the purpose of reproducability, you can substitute alpine:latest as the image and it behaves the same.

Prior to adding -d to the run and the docker logs it did not respect SIGINT at all, but now it works as expected. Eg:

./ci/run.sh sleep 30
5f5d78cfea27cdc15f5fede2003352253ae3254f44489ab4689ebca8d0f91768
^C./ci/run.sh caught SIGINT
ansible

However, if I run a pkill run.sh from another terminal it still waits the full 30 seconds before handling the signal, raising an error that the container is already gone. Eg:

./ci/run.sh sleep 30
a642a1060dc9d340e92dc255d68a9d9cb26d62ec59c5ef8d4e3d4198f1692c3e
./ci/run.sh caught SIGTERM
Error response from daemon: Cannot kill container: ansible: Container a642a1060dc9d340e92dc255d68a9d9cb26d62ec59c5ef8d4e3d4198f1692c3e is not running

Ultimately, the observed behaviour in the CI system is the same. The process is issued a SIGTERM, and then after not responding for 30 seconds a SIGKILL. This terminates the wrapper script, but not the docker command.


Solution

As @brunson said, I needed an init process to handle signal propagation.

When I was originally writing this my thought was "it's just a command, it doesn't need an initd" which was somewhat true until the very instant I needed it to respect signals at all. Frankly it was a foolish thought in the first place.

Anyhow, to accomplish the fix I used tini.

Added to Dockerfile:

RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]

and run.sh is back down to a much more manageable:

#!/bin/bash

function ansible_base() {
  docker run --rm someorg/ansible:latest "$@"
}

ansible_base "$@"


Answered By - Sammitch
Answer Checked By - Mary Flores (WPSolving Volunteer)