Saturday, April 9, 2022

[SOLVED] Read each line from two files and print line that does not exists in other

Issue

Team, I have two files with some duplicates. I want to print or create new list with unique ones. however, my list is getting printed empty. not sure why

f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            if lineA != lineB:
                print("lineA not equal to lineB", lineA, lineB)
            else:
                unique.append(lineB)
print(unique)

output

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[]

expected

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[node321,node12345]

Second Approach looking at comments list is getting populated but all empty and not recognizing actual strings.

 [~] $ cat  ~/backup/2strings.log
restr1
restr2

 [~] $ cat ~/backup/4strings.log 
restr1
restr2
restr3
restr4


file2 = os.environ.get('HOME') + '/backup/2strings.log'
file1 = os.environ.get('HOME') + '/backup/4strings.log'
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            # if lineA.rstrip() != lineB.rstrip():
            if lineA.strip() != lineB.strip():
                print("lineA not equal to lineB", lineA, lineB)
            else:
                print("found uniq")
        unique.append(lineB.rstrip())
print(unique)
print(len(unique))

output

found uniq
lineA not equal to lineB restr1
 restr2

lineA not equal to lineB restr1
 

['', '', '', '', '']
5

Solution

I recommend you to use a different but simpler approach. Use sets data structures. Link - https://docs.python.org/3/tutorial/datastructures.html#sets

Pseudo code

unique = []
items01 = set([line.strip() for line in open(file1).readlines()])
items02 = set([line.strip() for line in open(file2).readlines()])

# unique items not present file2
print(list(items01 - items02))
unique += list(items01 - items02)

# unique items not present file2
print(list(items02 - items01))
unique += list(items02 - items01)

# all unique items
print(unique)

In your code, you are using file01 as reference to check items in file01. You need to do the reverse of it too. Challenge No. 2 is too much time complexity. Python sets does hashing internally for performance boost, so use sets.



Answered By - sam
Answer Checked By - Gilberto Lyons (WPSolving Admin)