Issue
I would like to know what is wrong with this implementation of piping. I am trying to implement the following command ls |grep "main-pipe" |wc
. However, it gets into infinite loop and I don't understand why it doesn't read from the standard input. I have also checked whether the output from ls
is correctly obtained in the second process(for grep
) is correct and it was correct. I can't figure out what causes the infinite loop. Could you please help me with that?
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
int main(int argc, char* argv[]) {
if (argc > 1) {
printf("Please don't provide additional arguments.\n");
exit(1);
}
int pipe1[2];
pipe(pipe1);
int pid = fork();
if (pid == 0) {
// Write to the write end of the pipe
close(1);
dup(pipe1[1]);
close(pipe1[1]);
close(pipe1[0]);
execlp("ls", "ls");
}
wait(0);
int pipe2[2];
pipe(pipe2);
pid = fork();
if (pid == 0) {
close(0);
dup(pipe1[0]);
close(pipe1[0]);
close(pipe1[1]);
close(1);
dup(pipe2[1]);
close(pipe2[0]);
close(pipe2[1]);
execlp("grep", "grep", "main-pipe", (char *)NULL);
}
wait(0);
close(0);
dup(pipe2[0]);
close(pipe1[0]);
close(pipe1[1]);
close(pipe2[0]);
close(pipe2[1]);
execl("wc", "wc");
exit(0);
}
Solution
it gets into infinite loop and I don't understand why it doesn't read from the standard input.
An infinite loop is not the only reason for failing to make progress. There's no reason to think that any process is failing to attempt to read from its standard input. In this case, in fact, attempting to read from the standard input is exactly what prevents the grep
process from making progress.
You have performed the redirections correctly, though as I noted in comments, this is among the many cases where dup2()
would be a better choice than dup()
.
The primary issue is your wait(0)
calls. These are semantically inappropriate, because they interfere with one of the key characteristics of a shell-style pipeline, that the participating processes run in parallel, each consuming the output of the previous more-or-less as it is produced. More importantly, however, the wait()
s cause a practical issue for you, amounting to an inter-process deadlock.
You need to appreciate several things:
A process operating as a UNIX-style filter reads and processes data from its standard input until it detects end-of-file. Typically, such a process emits its results to its standard output, allowing multiple such processes to be composed.
grep
works this way when reading from its standard input.EOF is observed on the read end of a pipe only after all copies of the write end of that pipe have been closed. Until then, readers can never be confident that there won't eventually be more data to read.
Each process has its own file descriptors, but multiple processes can all have their own file descriptors that refer to the same underlying open file description. The numeric values of those FDs sometimes are the same, but they don't need to be. Such multiple references to the same open file description especially happen as a result of file descriptors being duplicated across a
fork()
.
Now consider how that all plays out for your program.
The parent sets up a pipe whose ends are stored in
pipe1
.It forks two children, one to run
ls
and the other to rungrep
. Each of these inherits copies of the parent's file descriptors for the pipe ends.- I'm ignoring the first
wait()
here. Ifls
's output were very long then thiswait()
could cause a problem, with the pipe's buffer filling to capacity and therefore preventingls
from finishing. But that's unlikely to be what's actually happening in your case.
- I'm ignoring the first
Each child performs appropriate redirection. The
ls
redirects its standard output into the write end of the pipe, and thegrep
redirects is standard input from the read end of the pipe.Each of those children closes its own excess copies of the file descriptors for the ends of that pipe. At this point, however, the parent still has open file descriptors for both ends of that pipe.
When the
ls
terminates, all its open file descriptors are closed automatically. This is what ordinarily would allow thegrep
process to observe EOF on its standard input, and therefore to itself terminate. That does not happen in your program, however, because the parent still has an open file descriptor for the write end of that pipe.Meanwhile, the parent is blocked at the second
wait()
. Like the previouswait()
this one could be an issue strictly in and of itself ifgrep
produced large enough output to fill the buffer of the pipe to which its output is connected. That's moot in this case, however, because the parent defers closing its copies of thepipe1
file descriptors until after thewait()
returns, while at the same time, thegrep
process cannot terminate normally until the parent closes the write end of that pipe. Neither process can make progress once thegrep
has consumed all the output fromls
-- they are deadlocked.
You could solve this particular deadlock by having the parent close at least the write end of pipe1 as soon as it forks the second child, at which point the parent has no further use for that pipe anyway. Closing unneeded pipe ends as soon as possible is good programming practice, verging on essential. But eliminating the wait(0)
calls is necessary anyway to avoid other issues that I've touched on, and doing that would be sufficient to allow the parent to perform all its necessary pipe-end closures soon enough to resolve the deadlock.
Answered By - John Bollinger Answer Checked By - Gilberto Lyons (WPSolving Admin)