Tuesday, November 16, 2021

[SOLVED] Combining two columns from different files by common strings

November 16, 2021 awk, echo, sed, unix

Issue

I have two tab-delimited files.

file-1


NODE_1_length_59711_cov_84.026979_g0_i0_1	K02377
NODE_1_length_59711_cov_84.026979_g0_i0_2
NODE_2_length_39753_cov_84.026979_g0_i0_1	K02377
NODE_2_length_49771_cov_84.026979_g0_i0_2	K16554

................................ .....................totally 391443 lines

file2


NODE_1_length_59711_cov_84.026979_g0_i0_1	56.54
NODE_1_length_59711_cov_84.026979_g0_i0_2	51.0
NODE_2_length_39753_cov_84.026979_g0_i0_1	12.6
NODE_2_length_49771_cov_84.026979_g0_i0_2	18.9

................................ .....................totally 391249 lines

I want to merge this two files keeping the first column the same.


NODE_1_length_59711_cov_84.026979_g0_i0_1	K02377	56.54
NODE_1_length_59711_cov_84.026979_g0_i0_2		51.0
NODE_2_length_39753_cov_84.026979_g0_i0_1	K02377	12.6
NODE_2_length_49771_cov_84.026979_g0_i0_2	K16554	18.9

The problem is as the first file have almost 190 more lines I cannot directly combine them as it will give wrong output. Is there any way I can combine these files by the common string from the first column?

Solution

Using GNU awk:

awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2

Explanation:

awk 'NR==FNR { 
               map[$1]=$2;                                  # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
               next 
             } 
             { 
               map1[$1]=$2                                  # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
             } 
         END { 
               PROCINFO["sorted_in"]="@ind_str_asc";         # Set the index ordering
               for (i in map) { 
                 print i"\t"map[i]"\t"map1[i]                # Loop through the map array and print the values along with the values in map1.
               } 
              }' file-1 file2

Answered By - Raman Sailopal

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 16, 2021

[SOLVED] Combining two columns from different files by common strings

Issue

Solution

Popular Posts

Labels