Issue
I have two tab-delimited files.
file-1
NODE_1_length_59711_cov_84.026979_g0_i0_1 | K02377 |
NODE_1_length_59711_cov_84.026979_g0_i0_2 | |
NODE_2_length_39753_cov_84.026979_g0_i0_1 | K02377 |
NODE_2_length_49771_cov_84.026979_g0_i0_2 | K16554 |
................................ .....................totally 391443 lines
file2
NODE_1_length_59711_cov_84.026979_g0_i0_1 | 56.54 |
NODE_1_length_59711_cov_84.026979_g0_i0_2 | 51.0 |
NODE_2_length_39753_cov_84.026979_g0_i0_1 | 12.6 |
NODE_2_length_49771_cov_84.026979_g0_i0_2 | 18.9 |
................................ .....................totally 391249 lines
I want to merge this two files keeping the first column the same.
NODE_1_length_59711_cov_84.026979_g0_i0_1 | K02377 | 56.54 |
NODE_1_length_59711_cov_84.026979_g0_i0_2 | 51.0 | |
NODE_2_length_39753_cov_84.026979_g0_i0_1 | K02377 | 12.6 |
NODE_2_length_49771_cov_84.026979_g0_i0_2 | K16554 | 18.9 |
The problem is as the first file have almost 190 more lines I cannot directly combine them as it will give wrong output. Is there any way I can combine these files by the common string from the first column?
Solution
Using GNU awk:
awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2
Explanation:
awk 'NR==FNR {
map[$1]=$2; # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
next
}
{
map1[$1]=$2 # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
}
END {
PROCINFO["sorted_in"]="@ind_str_asc"; # Set the index ordering
for (i in map) {
print i"\t"map[i]"\t"map1[i] # Loop through the map array and print the values along with the values in map1.
}
}' file-1 file2
Answered By - Raman Sailopal