Issue
I have 2 CSV files (that are |
delimited) I am trying to compare in a bash script. File A has 5 columns; File B has 1 column. I do not have any headers currently, was going to add them in later. Column 5 in File 1 contains some entries that match Column 1 in File 2. I would like to end up with 2 new files: File C that has Columns 1-5 from File A where File A Column 1 and File B Column 1 match; File D that has Columns 1-5 from File A where File A Column 1 and File B Column 1 do not match.
Sample Text from File A:
CT240021313|2024-02-04 00:00:02|CT|1.2.345.6
AB123|2024-02-01 11:22:33|MR|7.8.98.102
PDQ987|2024-02-04 8:50:11|XR|1.23.45.6
PA7531|2024-02-03 9:34:52|CT|6.3.489
Sample Text from File B:
1.2.345.6
6.3.489
Expected output File C:
CT240021313|2024-02-04 00:00:02|CT|1.2.345.6
PA7531|2024-02-03 9:34:52|CT|6.3.489
Expected output File D:
AB123|2024-02-01 11:22:33|MR|7.8.98.102
PDQ987|2024-02-04 8:50:11|XR|1.23.45.6
I've tried several variations on awk, with little to no success even finding any matches. Yet I can export the files and import into Excel, and highlight the matching values from File A Column 5 and File B Column 1 and it finds many matches. What I have currently, which returns basically every line in File A:
awk 'NR==FNR{a[$0];next !($0 in a)' file_a.csv file_b.csv > file_c.csv
Solution
Try
awk -F '|' 'NR==FNR{b[$0];next} ($4 in b){print>"C";next} {print>"D"}' B A
You were already on a good track, but had the order mixed up.
Here, we specify the input files the other way around. First, we read B
into memory, and then we partition A
based on the values we saw in B
before.
Answered By - Socowi Answer Checked By - David Goodson (WPSolving Volunteer)