Issue
I am trying the tiny shell lab in CSAPP. But my code stucks when I input an command line.
steven@Steven:/mnt/f/大学/CSAPP/cmu15213/shlab-handout$ ./tsh
tsh> 123
tsh> 123: command not found
123
123
123
^\Terminating after receipt of SIGQUIT signal
steven@Steven:/mnt/f/大学/CSAPP/cmu15213/shlab-handout$
Link to the lab:
I modified Makefile by adding -Og -g
and tried to debug with GDB and VSCode.
I found the program get stuck on the sigprocmask
. As shown in the following picture, if I click "Step Over", it continues to run and never stops.
I copy the relevant piece of code and ran it separately, it works correctly.
- code for testing: https://paste.ubuntu.com/p/MWZybqZqkZ/
tsh.c
: https://paste.ubuntu.com/p/g86JpxvJhv/- entire folder: https://filetransfer.io/data-package/2BkyuuSI#link
I have tested this both in WSL and a virtual machine, and both exhibited the same behavior.
Solution
if I click "Step Over", it continues to run and never stops.
I reproduced that. So let's see what's going on.
gdb -q ./tsh
(gdb) break tsh.c:191
(gdb) b tsh.c:191
Breakpoint 1 at 0x40156a: file tsh.c, line 191.
(gdb) run
Starting program: /tmp/tsh
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
tsh> aaa
[Detaching after fork from child process 182]
aaa: command not found
Breakpoint 1, eval (cmdline=0x7fffffffd930 "aaa\n") at tsh.c:191
191 sigprocmask(SIG_SETMASK, &prev, NULL);
(gdb) n
At this point everything hangs, so we must be stuck on sigprocmask
, right?
Actually, we are not.
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7eb5c37 in __GI___wait4 (pid=-1, stat_loc=0x7fffffffcd5c, options=3, usage=0x0)
at ../sysdeps/unix/sysv/linux/wait4.c:30
30 return SYSCALL_CANCEL (wait4, pid, stat_loc, options, usage);
(gdb) bt
#0 0x00007ffff7eb5c37 in __GI___wait4 (pid=-1, stat_loc=0x7fffffffcd5c, options=3, usage=0x0)
at ../sysdeps/unix/sysv/linux/wait4.c:30
#1 0x0000000000401a02 in sigchld_handler (sig=17) at tsh.c:383
#2 <signal handler called>
#3 __GI___pthread_sigmask (how=2, newmask=<optimized out>, oldmask=0x0) at pthread_sigmask.c:43
#4 0x00007ffff7e18d8d in __GI___sigprocmask (how=<optimized out>, set=<optimized out>, oset=<optimized out>)
at ../sysdeps/unix/sysv/linux/sigprocmask.c:25
#5 0x0000000000401583 in eval (cmdline=0x7fffffffd930 "aaa\n") at tsh.c:191
#6 0x000000000040142d in main (argc=1, argv=0x7fffffffde68) at tsh.c:149
Now we see what's actually going on. The sigprocmask
unblocks SIGCHLD
, which results in immediate delivery of that signal just before sigprocmask
was about to return. That in turn invokes the sigchld_handler
, which repeatedly calls waitpid
in a never-ending loop.
Why doesn't the loop terminate? Because the code expects waitpid
to return 0
when there are no children, but that is not correct: waitpid
returns -1
in that case.
The following fix makes tsh
work as one might expect:
diff -u tsh.c.orig tsh.c
--- tsh.c.orig 2024-01-20 21:42:47.915401415 -0800
+++ tsh.c 2024-01-20 21:43:20.145996657 -0800
@@ -383,7 +383,7 @@
pid = waitpid(-1, &status, WNOHANG | WUNTRACED);
// 如果没有僵尸进程,则退出
- if (pid == 0)
+ if (pid == 0 || pid == -1)
return;
// 如果子进程终止导致waitpid从阻塞中恢复
./tsh
tsh> aaa
aaa: command not found
tsh> bbb
bbb: command not found
tsh>
Here is the relevant text from Linux man page:
If
waitpid()
was invoked withWNOHANG
set in options, it has at least one child process specified bypid
for which status is not available, and status is not available for any process specified by pid, 0 is returned. Otherwise, -1 shall be returned
Answered By - Employed Russian Answer Checked By - Gilberto Lyons (WPSolving Admin)