Issue
I'm trying to replace any occurrence of a cwe.mitre.org.*.html
(regex) URL and remove the .html
extension and not change any other type of URL.
Example:
https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html
Expectation:
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
Is there a way to do this in sed or another tool?
I've tried sed -Ei 's/cwe.mitre.org.*.html/<REPLACEMENT>/g' file.txt
, but that won't work. Is there a way for the <REPLACEMENT>
to be a regular expression? The sed
manual doesn't seem to suggest that?
EDIT: I was wrong about the sed manual. It does mention it, see "5.7 Back-references and Subexpressions" section of https://www.gnu.org/software/sed/manual/sed.html.
Solution
$ sed 's/\(cwe\.mitre\.org.*\)\.html/\1/' file
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
google sed capture groups.
Answered By - Ed Morton Answer Checked By - Terry (WPSolving Volunteer)