Issue
I have a file with lines containing pairs of letter strings such as
ABXF\\CDYG
and a pair of target letters, for example X and Y (the target letters may vary). I would like to locate all the lines where the target letters are in the same position (in this example, both are in position 3 of their respective letter strings). The locations could be anywhere, include at the very first, or at the very last position. The two letter strings always have the same length.
How could I do such a search with regular expressions? (here the Perl grep).
Solution
If that's okay with you, here's a shellscript that might do the job.
#! /bin/sh
Target="${1:?missing target letters}"
File="${2:?missing input filename}"
Previous=''
test "${#Target}" -eq 2 || { echo 'please provide two target letters'; exit 1; }
test -r "$File" || { echo "cannot find file \"$File\""; exit 1; }
grep -n -b -o -e "${Target%?}\\|${Target#?}" "$File" \
| while read -r Line
do if test "${Line%%:*}" != "${Previous%%:*}"
then Previous="$Line"
else
printf '%s:%s\n' "$Previous" "$Line" \
| { IFS=':' read -r Line Pos1 Char1 _ Pos2 Char2
test "$(( Pos1 == Pos2 - 6))" -eq 1 \
&& test "$Char1" != "$Char2" \
&& echo "match at line $Line"
}
Previous=''
fi
done
Based on the following input data:
ABXF\\CDYG
ZETX\\FCBA
XHCB\\YEIH
BYCT\\ABCD
CYTZ\\AXVH
ABXZ\\CDXV
when you invoke the script like this:
./scriptname XY INPUTFILE
it produces this output:
match at line 1
match at line 3
match at line 5
Explanation
The script uses the -o
-b
and -n
grep options.
- '-n' prints a line number for every match
- '-b' includes a byte offset for every match
- '-o' prints a matching result for every occurrences in a given line
Thus grep -n -b -o -e 'X\|Y' INPUTFILE
produces :
1:2:X
1:8:Y
2:14:X
3:22:X
3:28:Y
4:34:Y
(line:offset:matched expression)
The script only parses that output, assuming that:
- IF PreviousLine == CurrentLine
- AND PreviousOffset + 6 == CurrentOffset
- AND the matched letters are different
- THEN there's a match
Tested under Debian 11 with GNU grep.
Hope that helps.
Answered By - Grobu Answer Checked By - Pedro (WPSolving Volunteer)