Tuesday, November 16, 2021

[SOLVED] Combining two columns from different files by common strings

Issue

I have two tab-delimited files.

file-1

NODE_1_length_59711_cov_84.026979_g0_i0_1 K02377
NODE_1_length_59711_cov_84.026979_g0_i0_2
NODE_2_length_39753_cov_84.026979_g0_i0_1 K02377
NODE_2_length_49771_cov_84.026979_g0_i0_2 K16554

................................ .....................totally 391443 lines

file2

NODE_1_length_59711_cov_84.026979_g0_i0_1 56.54
NODE_1_length_59711_cov_84.026979_g0_i0_2 51.0
NODE_2_length_39753_cov_84.026979_g0_i0_1 12.6
NODE_2_length_49771_cov_84.026979_g0_i0_2 18.9

................................ .....................totally 391249 lines

I want to merge this two files keeping the first column the same.

NODE_1_length_59711_cov_84.026979_g0_i0_1 K02377 56.54
NODE_1_length_59711_cov_84.026979_g0_i0_2 51.0
NODE_2_length_39753_cov_84.026979_g0_i0_1 K02377 12.6
NODE_2_length_49771_cov_84.026979_g0_i0_2 K16554 18.9

The problem is as the first file have almost 190 more lines I cannot directly combine them as it will give wrong output. Is there any way I can combine these files by the common string from the first column?


Solution

Using GNU awk:

awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2

Explanation:

awk 'NR==FNR { 
               map[$1]=$2;                                  # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
               next 
             } 
             { 
               map1[$1]=$2                                  # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
             } 
         END { 
               PROCINFO["sorted_in"]="@ind_str_asc";         # Set the index ordering
               for (i in map) { 
                 print i"\t"map[i]"\t"map1[i]                # Loop through the map array and print the values along with the values in map1.
               } 
              }' file-1 file2


Answered By - Raman Sailopal