Saturday, April 9, 2022

[SOLVED] Compare 2 files on specific row only

Issue

I need to compare 2 files and find the matching rows. The only problem is that I need to check the 4th row out of 5 from DocumentList file and return the entire line if a match is found in final file.

cat DocumentList.xml
<?xml version="1.0" encoding="UTF-8" ?> <block-list:block-list xmlns:block-list="http://openoffice.org/2001/block-list">
<block-list:block block-list:abbreviated-name="adn" block-list:name="and" />
<block-list:block block-list:abbreviated-name="tesst" block-list:name="test" />
<block-list:block block-list:abbreviated-name="tust" block-list:name="test" />
<block-list:block block-list:abbreviated-name="seme" block-list:name="same"/>

And the second file is:

cat final.txt
and
test
india

I can extract the forth row using this command, but do not know how to compare with the lines from final file

awk -F '\"' '{print $4}' DocumentList.xml

Expected Result:

<block-list:block block-list:abbreviated-name="adn" block-list:name="and" />
<block-list:block block-list:abbreviated-name="tesst" block-list:name="test" />
<block-list:block block-list:abbreviated-name="tust" block-list:name="test" />

I have also tried something like this, but it does not return the entire line from DocumetList file.

awk -F '\"' 'FNR==NR {a[$4]; next} $1 in a'  DocumentList.xml final.txt

final.txt file is 1 GB, DocumentList is 25 MB and both have unicode characters.


Solution

Just swap the order of reading files:

awk -F '\"' 'FNR==NR {a[$0]; next} $4 in a' final.txt DocumentList.xml

Output:

<block-list:block block-list:abbreviated-name="adn" block-list:name="and" />
<block-list:block block-list:abbreviated-name="tesst" block-list:name="test" />
<block-list:block block-list:abbreviated-name="tust" block-list:name="test" />


Answered By - tshiono
Answer Checked By - Terry (WPSolving Volunteer)