Saturday, October 29, 2022

[SOLVED] Awk iteratively replacing strings from array

Issue

I've been recently trying to do the following in awk - we have two files (F1.txt F2.txt.gz). While streaming from the second one, I want to replace all occurrences of entries from f1.txt with its substrings. I came to this point:

zcat F2.txt.gz |
    awk 'NR==FNR {a[$1]; next}
    {for (i in a)
         $0=gsub(i, substr(i, 0, 2), $0) #this does not work of course
    }
    {print $0}
' F1.txt -

Was wondering how to do this properly in Awk. Thanks!


Solution

Please correct the assumptions if wrong.

You have two files, one includes a set of entries. If the second file has any one of these words, replace them with first two chars.

Example:

==> file1 <==
Azerbaijan
Belarus
Canada

==> file2 <==
Caspian sea is in Azerbaijan
Belarus is in Europe
Canada is in metric system.


$ awk 'NR==FNR {a[$1]; next} 
               {for(i=1;i<=NF;i++) 
                   if($i in a) $i=substr($i,1,2)}1' file1 file2

Caspian sea is in Az
Be is in Europe
Ca is in metric system.

note that substring index starts with 1 in awk.



Answered By - karakfa
Answer Checked By - Cary Denson (WPSolving Admin)