Saturday, April 9, 2022

[SOLVED] Extract first position of a regex match grep

Issue

Good morning everyone,

I have a text file containing multiple lines. I want to find a regular pattern inside it and print its position using grep.

For example:

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG

I want to find L[any_letter]T in the file and print the position of L and the three letter code. In this case it would results as:

11 LIT
8 LAT
4 LKT

I wrote a code in grep, but it doesn't return what I need. The code is:

grep -E -boe "L.T" file.txt

It returns:

11:LIT
21:LAT
30:LKT

Any help would be appreciated!!


Solution

Awk suites this better:

awk 'match($0, /L[[:alpha:]]T/) {
print RSTART, substr($0, RSTART, RLENGTH)}' file

11 LIT
8 LAT
4 LKT

This is assuming only one such match per line.


If there can be multiple overlapping matches per line then use:

awk '{
   n = 0
   while (match($0, /L[[:alpha:]]T/)) {
      n += RSTART
      print n, substr($0, RSTART, RLENGTH)
      $0 = substr($0, RSTART + 1)
   }
}' file


Answered By - anubhava
Answer Checked By - Senaida (WPSolving Volunteer)