Wednesday, August 31, 2022

[SOLVED] What does the Unix Philosophy say that a tool should do with data that ends with a sequence of characters and no newline?

Issue

It is my understanding that Unix defines a "line" as a sequence of zero of more characters followed by a newline. Do I understand correctly?

The last line is a "line" (of course) so the last line must have a newline. Is that correct?

Suppose there is a sequence of characters, a newline, and then a sequence of characters. That is, no newline after the last sequence of characters. What does that mean? Does it mean that it is bad/invalid data? What does the Unix Philosophy say that a tool should do with such data? Reject it? Process all lines and ignore the last sequence of characters? Something else?


Solution

Here's a handful of examples from Linux:

$ printf 'line\neof' > y
$ cat y
line
eof$ 
$ wc -l y
1 y
$ grep eof y
eof
$ tac y
eofline
$ rev y
enil
foe$ sort y
eof
line
$ tail -n 1 y
eof$ sed -n 1p y
line
$ sed -n 2p y
eof$

As you can see, the behavior isn't consistent:

  • cat and wc are very literal and don't add any missing newline
  • grep and sort add a newline
  • rev, sed and tail consider the last line but don't add a newline
  • tac just gets confused

But you'll also note:

  • None of those programs treat it as invalid data.
  • None of these programs ignore the part after the last newline.
  • For the most part, these programs will work as the user would expect them to work if piped together.

So if there's any "Unix philosophy" takeaway here, it's less about newlines and more about input handling as noted above.



Answered By - root
Answer Checked By - Robin (WPSolving Admin)