Tuesday, November 16, 2021

[SOLVED] AWK to use multiple spaces as delimiter

Issue

I am using below command to join two files using first two columns.

awk 'NR==FNR{a[$1,$2]=substr($0,3);next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_3.txt"}' br01.txt br02.txt

Now, by default AWk command uses whitespaces as the separators. But my file may contain single space between two words, e.g.

File 1:

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567

File 2:

ABCD               TEXT1 TEXT2                     12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      31242342342342342342342342343
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

I want the result file as ;

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312 12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423 31242342342342342342342342343
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

Any hints ?


Solution

awk supports a regular expression as the value of FS so you can specify a regular expression that matches at least two spaces. Something like -F '[[:space:]][[:space:]]+'.

$ awk '{print NF}' File2
4
3
4

$ awk -F '[[:space:]][[:space:]]+' '{print NF}' File2
3
3
3


Answered By - Etan Reisner