Tuesday, February 22, 2022

[SOLVED] What does gdb change in debugged program behaviour w.r.t. `accept()` and `close()`

Issue

I've been chasing a problem: on exit application hangs, but when you debug it all is good. It boiled down to a wrong assumption made by someone 15 years ago, specifically he assumed that if one thread is waiting in accept() -- closing that handle in another thread will cause accept() to fail. Some of process unwinding code hinged on this assumption (and I know this assumption is not correct).

Question: why this assumption holds when program is being debugged? Precisely what changes in execution environment?

Edit: observed in CentOS 7

Edit 2: I know it is a UB and I need to fix it. My question is not "what to do?" but "why it happens?". I am curious because ability to sense debugger via side effects like this one is pretty cool and one day may came handy.

Edit 3:

I've discovered that if your process have a signal handler installed and (after closing fd) you send that signal (via pthread_kill()) to thread currently sleeping in accept() -- that call always immediately returns (with EBADF error). Doesn't matter what your handler is doing (as long as it returns). It looks like signal delivery causes thread to wake up, interrupt accept() and restart it (at which point it checks if related file "handle" is good and exit with error).

I am not encouraging to rely on this behavior, but propose a possible explanation to original question -- maybe gdb periodically wakes up every thread with some signal? Or being ptraced means kernel (for some reason) will periodically wake up every thread "as if" it was interrupted by a signal?


Solution

From ptrace man:

Signal injection and suppression

After signal-delivery-stop is observed by the tracer, the tracer should restart the tracee with the call

   ptrace(PTRACE_restart, pid, 0, sig)

where PTRACE_restart is one of the restarting ptrace requests. If is 0, then a signal is not delivered. Otherwise, the signal sig is delivered. This operation is called signal injection in this man‐ ual page, to distinguish it from signal-delivery-stop.

The sig value may be different from the WSTOPSIG(status) value: the tracer can cause a different signal to be injected.

Note that a suppressed signal still causes system calls to return prematurely. In this case, system calls will be restarted: the tracer will observe the tracee to reexecute the interrupted system call (or restart_syscall(2) system call for a few system calls which use a different mechanism for restarting) if the tracer uses PTRACE_SYSCALL. Even system calls (such as poll(2)) which are not restartable after signal are restarted after signal is suppressed; however, kernel bugs exist which cause some system calls to fail with EINTR even though no observable signal is injected to the tracee.

In my case termination logic starts with a delivery of a signal (that gets intercepted by gdb) and then passed injected into process being traced. strace-ing gdb produces:

ptrace(PTRACE_PEEKTEXT, 2274, 0x7f0c9d0ee2e0, [0x7f0ca013ab80]) = 0      <-- gdb woke up to SIGTERM directed at tracee
ptrace(PTRACE_PEEKUSER, 2274, 8*SS + 8, [0x7f0ca013a8c0]) = 0
ptrace(PTRACE_GETREGS, 2274, 0, 0x7ffdb4f2c3a0) = 0
...
ptrace(PTRACE_CONT, 2338, 0x1, SIG_0)   = 0
ptrace(PTRACE_CONT, 2274, 0x1, SIGTERM) = 0     <-- SIGTERM is delivered to a thread chosen by kernel
...
ptrace(PTRACE_CONT, 2276, 0x1, SIG_0)   = 0     <-- all other threads are restarted
...

note that this is not enough to explain behavior fully because thread sleeping in accept() can restart syscall before other thread closes file descriptor.

But strace log is choke-full of similar sequences of commands (PTRACE_PEEKTEXT followed by ever-reducing number of PTRACE_CONT). What happens here is gdb wakes up on every thread termination, pulls some data out of tracee and restarts (remaining) threads, causing syscalls to be restarted. I.e. as threads exit one after another, each remaining thread gets stopped and restarted multiple times eventually causing accept() to be restarted after file descriptor is closed by another thread. In fact it is guaranteed to happen because said thread exits after closing it.



Answered By - C.M.
Answer Checked By - Mildred Charles (WPSolving Admin)