Issue
Why the the beginning and end of a line is matched by the [.]*
and how to avoid it?
It seems the patten matches with the word boundary but not sure if it is as designed and which specification it implements.
$ echo " a " | sed -n 's/[.]*/X/pg'
X XaX X
$ echo " a " | sed -n 's/[\b]*/X/pg'
X XaX X
Solution
Note that \b
inside brackets has no special meaning. It is just a list of characters.
[.]*
and [\b]*
match zero or more of the character(s) in brackets.
So they match any runs of one or more of the characters, where the empty strings immediately preceding and following a run of the character(s) are just part of that run.
They also match any empty string that is not immediately preceded or followed by a run of the character(s).
Replacing space with something more visible, your input is: sas
None of the characters .
, \
nor b
appear in the string. So there are no runs of length one or longer. With this input, both [.]*
and [\b]*
are equivalent to "match empty string".
- the empty string between start of line and first
s
matches - the empty string between first
s
anda
matches - the empty string between
a
and seconds
matches - the empty string between second
s
and end of line matches
These 4 matches explain the X
s added to your sample output.
Using \b
to mean word boundary is not standard, although some versions of sed
accept it (or the related \<
and \>
).
It is safer not to use this extension, and certainly not with *
.
Even versions of sed
that appear to support it produce non-intuitive and inconsistent results.
For example, with GNU sed 4.8:
$ echo ,aa, | sed 's/\b/x/g'
,xaax,
$ echo ,aa, | sed 's/\b*/x/g'
,aa,
$ echo ,aa, | sed 's/\b\{1,\}/x/g'
sed: -e expression #1, char 14: Invalid preceding regular expression
$ echo ,aa, | sed 's/\(\b\)\{1,\}/x/g'
,xaax,
With busybox sed 1.30.1:
$ echo ,aa, | busybox sed 's/\b/x/g'
,xaxa,
$ echo ,aa, | busybox sed 's/\b*/x/g'
,aa,
$ echo ,aa, | busybox sed 's/\b\{1,\}/x/g'
sed: bad regex '\b\{1,\}': Invalid preceding regular expression
$ echo ,aa, | busybox sed 's/\(\b\)\{1,\}/x/g'
,xaxa,
Even other programs like Perl require care:
$ echo ,aa, | perl -ple 's/\b/x/g'
,xaax,
$ echo ,aa, | perl -ple 's/\b*/x/g'
x,xaxax,x
$ echo ,aa, | perl -ple 's/\b{1,}/x/g'
'1,' is an unknown bound type in regex; marked by <-- HERE in m/\b{1, <-- HERE }/ at -e line 1.
$ echo ,aa, | perl -ple 's/(\b){1,}/x/g'
,xaax,
$ echo ,aa, | perl -ple 's/\b+/x/g'
,xaax,
Answered By - jhnc Answer Checked By - Gilberto Lyons (WPSolving Admin)