Sunday, April 10, 2022

[SOLVED] Search with regex but replace only a portion of the string with sed

Issue

I'm trying to replace any occurrence of a cwe.mitre.org.*.html (regex) URL and remove the .html extension and not change any other type of URL.

Example:

https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html

Expectation:

https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

Is there a way to do this in sed or another tool?

I've tried sed -Ei 's/cwe.mitre.org.*.html/<REPLACEMENT>/g' file.txt, but that won't work. Is there a way for the <REPLACEMENT> to be a regular expression? The sed manual doesn't seem to suggest that?

EDIT: I was wrong about the sed manual. It does mention it, see "5.7 Back-references and Subexpressions" section of https://www.gnu.org/software/sed/manual/sed.html.


Solution

$ sed 's/\(cwe\.mitre\.org.*\)\.html/\1/' file
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

google sed capture groups.



Answered By - Ed Morton
Answer Checked By - Terry (WPSolving Volunteer)