Tuesday, March 15, 2022

[SOLVED] grep regexp to match space and/or TAB and '[:space:]' class

Issue

On CentOS 8, this grep expresssion does not return matched strings:

% dmidecode -t memory | grep -E '^[ \t]+Size: [0-9]+'

However this one does return matched lines correctly (on the same distro):

% dmidecode -t memory | grep -E '^[[:space:]]+Size: [0-9]+'

What is the reason of such behaviour? As you can see both times grep is invoked in extended regexp mode.


Solution

The issue here is the \t character sequence. This does not match a tab character in a grep regular expression, it matches the character t (Doesn't matter if it's basic or extended dialect RE). It's not treated as a special escape sequence the way it is by some other tools (Including GNU grep using the PCRE dialect).

Witness:

# printf /does/ treat \t and \n special in a format
$ printf "a\tb\n" | grep "a[ \t]b" # No match
$ printf  "atb\n" | grep "a[ \t]b" # Match
atb
$ printf "a\tb\n" | grep "a[[:space:]]b" # Match
a     b
$ printf "a\tb\n" | grep "a[[:blank:]]b" # Match
a     b
$ printf "a\tb\n" | grep "a\sb" # Match, \s is a GNU grep extension
a     b
$ printf "a\tb\n" | grep -P "a\sb" # Match, GNU grep using PCRE
a     b
$ printf "a\tb\n" | grep -P "a[ \t]b" # Match, GNU grep using PCRE.
a     b


Answered By - Shawn
Answer Checked By - Mary Flores (WPSolving Volunteer)