Issue
I am not asking if there is greedy regex in sed
, I already know that there is not. What I am asking is:
It's known that sed
is the best or one of the best stream editors that exists, but why didn't the developers of this tool implement the non-greedy regex? It looks simple comparing to all the things this tool can do.
Solution
History
Non-greedy matching is a feature of Perl-Compatible Regular Expressions. PCREs were only available as part of the Perl language until the 1997 implementation of libpcre, whereas the POSIX implementation of sed
was first introduced in 1992 -- and the implementation of standard-C-library regular expression routines which it references predates even that, having been published in 1988.
Standards-Body Definitions
The POSIX specification for sed
supports BRE; only BRE ("Basic Regular Expressions") and ERE ("Extended Regular Expressions") are specified in POSIX at all, and neither form contains non-greedy matching.
Thus, for PCRE support (or, otherwise, non-greedy matching support) to be standardized for inclusion in all sed
implementations, it would first need to be standardized in the POSIX regular expression definition.
However, it's highly unlikely that this would occur in practice (except as an extension to be present, or not, at the implementor's option), given the practical reasons for which PCRE support can be undesirable; see the following section:
Implementation Considerations
sed
is typically considered a "core tool", thus implemented with only minimal dependencies. Requiring libpcre in order to installsed
thus makes libpcre a part of your operating system that needs to be included even in images where size is at a premium (initrd/initramfs images, etc).- Multiple strategies for implementing regular expressions are available. The historical (very high-performance) implementation compiles the expression into a nondeterministic finite automata which can be executed in O(n) time against a string of size n given a fixed regular expression. The libpcre implementation uses backtracking -- which permits for easier implementation of features such as non-greedy matching, lookahead, and lookbehind -- but can often have far-worse-than-linear performance).
See https://swtch.com/~rsc/regexp/regexp1.html for a discussion of the performance advantages of Thompson NFAs over backtracking implementations.
Answered By - Charles Duffy Answer Checked By - Robin (WPSolving Admin)