Friday, October 7, 2022

[SOLVED] How do I compare lines in two files WITHOUT respect to their position in those files (set difference operation)

Issue

diff and similar tools seem to compare files, not content that happens to be in the form of lines in files. That is, they consider the position of each line in the file as significant and part of the comparison.

What about when you just don't care about position? I simply want to compare two lists in more like a set operation without any respect to position. Here each line can be considered a list element. So, I'm looking for what is the difference between lines in file1 and file2, and file2 and file1.

I don't want to see positional information, or do any a pairwise compariosn, just a result set for each operation. For example:

SET1: a b c d f g

SET2: a b c e g h

SET1 - SET2 = d f

SET2 - SET1 = e g

Can I do this easily in bash? Obviously it's fine to sort the list first or not but sorting is not intrinsically a prerequisute to working with sets


Solution

For simple line-oriented comparisons, the comm command might be all you need:

$ tail a.txt b.txt 
==> a.txt <==
a
b
c
d
f
g

==> b.txt <==
a
b
c
e
g
h
$ comm -23 <(sort a.txt) <(sort b.txt)
d
f
$ comm -13 <(sort a.txt) <(sort b.txt)
e
h

Also, it's probably worth it to enable the --unique flag on sort in order to remove duplicate lines:

comm -23 <(sort --unique a.txt) <(sort --unique b.txt)


Answered By - IonuČ› G. Stan
Answer Checked By - Timothy Miller (WPSolving Admin)