Issue
Given a file containing this string:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@IT1*1*CS*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@
The goal is to extract the following:
IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@
With the criteria being:
- The IT1 "line" must contain
*EA*
- The REF line must contain
BAR
Some notes for consideration:
- "@" can be thought of as a line break
- A "group" of lines contains lines starting with IT1 and ending with REF
- I am running GNU grep 3.7.
The goal is to select the "group" of lines meeting the criteria.
I tried the following:
grep -oP "IT1[^@]*EA[^@]*@.*REF[^@]*BAR[^@]*@" file.txt
But it captures characters from the beginning of the example.
Also tried to use lookarounds:
grep -oP "(?<=IT1[^@]*EA[^@]*@).*?(?=REF[^@]*BAR[^@]*@)" file.txt
But my version of grep returns:
grep: lookbehind assertion is not fixed length
Solution
Your issue is that .*
will match characters from the first IT1
with EA
to the last REF
with BAR
. You need to ensure the match doesn't go past the next IT1
, which you can do by replacing .*
with a tempered greedy token (?:(?!@IT1).)*
:
IT1[^@]*EA[^@]*@(?:(?!@IT1).)*REF[^@]*BAR[^@]*@
This will only match from an IT1
to its corresponding REF
.
Regex demo on regex101
Answered By - Nick Answer Checked By - Katrina (WPSolving Volunteer)