Tuesday, March 15, 2022

[SOLVED] Show all 4 digit numbers (but not the whole line and exclude special characters appended to number) using regex in bash

Issue

I want to find and print all 4 digit numbers (but not the whole line) in a file using regex in bash.

My sample file looks like this:

12
123
1234
2345 foo foo foo 
foo foo 3456 foo foo
# 8912 foo foo foo foo
#7654
-8999
\6478
/9023
$7654
A3356
8349B
1439$
1762\
12345
123456
0000
0001

I would like my output to include only 4 digits numbers (any number that has normal or special characters appended to it must be ignored):

1234
2345
3456
8912
0000
0001

The closest I have been able to come to this is:

grep -E '(^|[^0-9])[0-9]{4}($|[^0-9])' file_with_numbers.txt

which errantly captures numbers with special characters appended and also prints the whole line when I only want the six values as shown above:

1234
2345 foo foo foo foo
foo foo 3456 foo foo
# 8912 foo foo foo foo
#7654
-8999
\6478
/9023
$7654
A3356
1439$
1762\
0000
0001

Any suggestions on how I can get the exact desired output are appreciated. I am having trouble finding info for the appended special character exclusion as well as showing only the number and not the whole line.


Solution

Using (^|[^0-9]) and ($|[^0-9]) will make it part of the match.

You can make use of lookarounds asserting a whitespace boundary on the left and right.

To make use of the lookarounds, you can use -P to enable Perl compatible regular expressions.

grep -Po '(?<!\S)[0-9]{4}(?!\S)' file_with_numbers.txt

Output

1234
2345
3456
8912
0000
0001


Answered By - The fourth bird
Answer Checked By - Cary Denson (WPSolving Admin)