Issue
I have a text file where each line consists of a series of numbers separated by spaces followed by a word. The numbers consist of only the digits 1 through 6 and the digits within each number are ordered and unique. The word at the end of the file is not important.
For example:
2356 345 12345 4 4 1 6 gripped
12346 2 2346 123456 2356 56 245 12346 13456 12456 misidentifies
1256 345 24 12456 12356 123456 12356 356 1256 5 26 swine
would all be valid lines within my file.
I need to write a grep
command which uses a regex to match all lines which contain at least 8 numbers which have a 1 or a 6. That means the line 346 1245 136 23456 5 1356 123456 5 123456 123456 octettes
is a match (346, 1245, 136, 23456, 1356, 123456, 123456, 123456 are 8 numbers) but the line 1 236 145 23 16 4 12356 4 3 packers
is not a match (1, 236, 145, 16, 12356 are only 5 numbers).
Note: the regex does not have to match the full line. grep returns all lines with a match present somewhere so the only important part is having the minimum 8 matches.
I have constructed this regex: ((?:(?:123456)|(?:1[2-5]*)|(?:[2-5]*6)) )
It matches all of the numbers which match the condition and does not count 123456 twice. My issue is now with counting the number of occurrences. A {8,} would be sufficient if all of the numbers matches were one after another but sometimes there is one (or more) number in between the matches (e.g. 134 4 245 1245).
I have tried a lot of things including putting [2-5]{0,5}
, [2-5]*
or .*
in the matching group to be repeated (with a {8,}) but nothing seemed to work. They are either not matching correctly or giving a catastrophic backtracking error.
I am pretty new to regex so I might have misunderstood how some things work. I know I need to modify my capturing group for my {8,} quantifier to work but I do not know how.
Regex101 link with more examples and my current (partial) solution here.
Solution
If there are single spaces only, you might use
^(?:(?:[2345]+ )*[2345]*[16][1-6]* ){7}(?:[2345]+ )*[2345]*[16]
The pattern matches
^
Start of string(?:
Non capture group to repeat as a whole(?:[2345]+ )*
Optionally match numbers without 1 or 6[2345]*[16][1-6]*
Match a number with 1 or 6
){7}
Close the non capture group and repeat 7 times(?:[2345]+ )*[2345]*[16]
The 8th match
Example using grep -E
matching 1 or more spaces or tabs:
grep -E "^(([2345]+[[:blank:]]+)*[2345]*[16][1-6]*[[:blank:]]+){7}([2345]+[[:blank:]]+)*[2345]*[16]" file
Example using grep -P
matching 1 or more horizontal whitespace characters:
grep -P "grep -P "^(?>(?:[2345]+\h+)*+[2345]*[16][1-6]*\h+){7}(?:[2345]+\h+)*+[2345]*[16]" file" file
Answered By - The fourth bird Answer Checked By - Robin (WPSolving Admin)