Issue
I have a script that reads a log file line-by-line. I need to extract the text between two subtstrings, if they exist in the line my script is currently reading.
For instance, if a line has:
some random text here substring A abc/def/ghi substring B
I need to extract the text abc/def/ghi
that is between substring A
and substring B
by storing it in a variable. How would I go about doing this?
I looked through this Extract substring in Bash but can't find anything that exactly matches my use case.
Solution
Bash provides parameter expansion with substring removal that allows you to trim through "substring A"
from the front, and then trim "substring B"
from the back leaving "abc/def/ghi"
. For example, you can do:
ssa="substring A" ## substrings to find text between
ssb="substring B"
line="some random text here substring A abc/def/ghi substring B"
text="${line#*${ssa}}" ## trim through $ssa from the front (left)
text="${text%${ssb}*}" ## trim through $ssb from the back (right)
echo $text ## output result
Example OUtput
abc/def/ghi
The basic two forms for trimming from the front of a string and the two from trimming from the back of a string are:
${var#pattern} # Strip shortest match of pattern from front of $var
${var##pattern} # Strip longest match of pattern from front of $var
${var%pattern} # Strip shortest match of pattern from back of $var
${var%%pattern} # Strip longest match of pattern from back of $var
Where pattern
can contain globbing characters such as '*'
and '?'
. Look things over and let me know if you have any further questions.
Using BASH_REMATCH
BASH_REMATCH
is an internal array that contains the results of matching [[ text =~ REGEX ]]
. ${BASH_REMATCH[0]}
is the total text matched by REGEX
and then ${BASH_REMATCH[1..2..etc]}
are the matched portions of the regular expression captures between (...)
within the regular expression (of which you can provide multiple captures)
Using the same setup above, you could modify the script the replace the parameter expansions uses with text
to use
regex="^.*${ssa} ([^ ]+) ${ssb}.*$" ## REGEX to match with (..) capture
[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}
Where the regular expression in $regex
will match the entire line capturing what is between $ssa
and $ssb
. The complete modified script would be:
ssa="substring A" ## substrings to find text between
ssb="substring B"
line="some random text here substring A abc/def/ghi substring B"
regex="^.*${ssa} ([^ ]+) ${ssb}.*$" ## REGEX to match with (..) capture
[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}
(same output)
Both methods are fully explained in man 1 bash. Use whichever fits the circumstance you are faced with. I always found parameter expansion a bit more intuitive (and you can incrementally whittle text down to just about anything you need). However, the power of extended regular expression matching can provide a powerful alternative to the parameter expansions.
Answered By - David C. Rankin