Issue
I found this very difficult to solve in bash - I have two files that I want to find the common rows between them based on two columns.
f1.csv:
col1,col2,col3,col4
Dalir,Cpne1,down,2174
Fendrr,Aco2,up,280
Cpne1,Tox1,down,8900
f2.csv
col1,col2,col3,col4,col5,col6
Linc,Rmo,ch2,ch2,p,l
Tox1,Cpne1,ch1,ch2,l,p
so basically the code should look only at the first two columns of the dfs and see if pairs are the same (the order of the pairs is not important). So you can see that in the first df there is
Cpne1,Tox1
in the third row and in the second df there is Tox1,Cpne1
in the second row - so this should be printed in the output from the second file.
Desired output:
Tox1,Cpne1
Unfortunately, I have not been able to develop a bash command for this - it would be great if you could help me with this. Thanks
Solution
Just adding the explanation to oguz' fine answer in the comments above:
BEGIN{FS=OFS=","}
defines ,
to be the separator for both input and output.
NR==FNR{pair[$1,$2];next}
while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair
.
($1,$2) in pair||($2,$1) in pair{print $1,$2}
operating on the second file, check if field one and two in any order are present as index in the array pair
, and print them if they are.
Answered By - tink Answer Checked By - Timothy Miller (WPSolving Admin)