Issue
grep is great at finding lines that match a pattern. But what if you have a file with a single extremely long line (say a 100MB file), and you want to find chunks within it that match a pattern?
For each match, you'd want to print the character offset, and the matched string, with extra characters on either side for context.
In Python, you could write something like this (would need boundary checks):
[(m.start(), s[m.start()-50:m.end()+50]) for m in re.finditer(regex, s)]
But is there some way to do the equivalent using standard linux command line tools?
Solution
For each match, you'd want to print the offset, and the matched string, with extra characters on either side for context.
You can do that with awk like this:
awk '{
i = 1
while (match(substr($0, i), /regex/)) {
off = i + RSTART - 1
print off, substr($0, off > 50 ? off - 50 : 1, RLENGTH + 100)
i = off + RLENGTH
}
}' file
Answered By - oguz ismail Answer Checked By - Marie Seifert (WPSolving Admin)