Issue
diff and similar tools seem to compare files, not content that happens to be in the form of lines in files. That is, they consider the position of each line in the file as significant and part of the comparison.
What about when you just don't care about position? I simply want to compare two lists in more like a set operation without any respect to position. Here each line can be considered a list element. So, I'm looking for what is the difference between lines in file1 and file2, and file2 and file1.
I don't want to see positional information, or do any a pairwise compariosn, just a result set for each operation. For example:
SET1: a b c d f g
SET2: a b c e g h
SET1 - SET2 = d f
SET2 - SET1 = e g
Can I do this easily in bash? Obviously it's fine to sort the list first or not but sorting is not intrinsically a prerequisute to working with sets
Solution
For simple line-oriented comparisons, the comm
command might be all you need:
$ tail a.txt b.txt
==> a.txt <==
a
b
c
d
f
g
==> b.txt <==
a
b
c
e
g
h
$ comm -23 <(sort a.txt) <(sort b.txt)
d
f
$ comm -13 <(sort a.txt) <(sort b.txt)
e
h
Also, it's probably worth it to enable the --unique
flag on sort
in order to remove duplicate lines:
comm -23 <(sort --unique a.txt) <(sort --unique b.txt)
Answered By - IonuČ› G. Stan Answer Checked By - Timothy Miller (WPSolving Admin)