Issue
What I've got:
<ul>
<li><a href="https://example.com.com/link-1"></li>
<li><a href="https://example.com.com/link-2"></li>
<li><a href="https://example.com.com/link-3" ></li>
<!-- many more items here -->
</ul>
Desired end result:
<ul>
<li><a href="link-1.html"></li>
<li><a href="link-2.html"></li>
<li><a href="link-3.html" ></li>
<!-- many more items here -->
</ul>
Currently I've come up with something like:
sed 's/https:\/\/example.com.com//g' test.txt | sed 's/" *>/.html">/g'
But this is clearly (a) inefficient and (b) won't work inline (i.e. sed -i
when used in conjunction with find
, for example)
What would a better approach for this be ?
Solution
You could use a capturing group to avoid the second sed
invocation like this :
sed -e 's%https://[^"]*/\([^"]*\)%\1.html%'
The %
separator saves the need for escaping forward slashes.
Edit
If you want to make sure the substitution only occurs for instances of https://example.com
inside lines starting with <li><a ...>
tags, you could try:
sed -e '/^<li><a /s%"https://example.com[^"]*/\([^"]*\)%"\1.html%'
Based on the data sample you provided, you should get :
<ul>
<li><a href="link-1.html"></li>
<li><a href="link-2.html"></li>
<li><a href="link-3.html" ></li>
<!-- many more items here -->
</ul>
Hope that helps.
Answered By - Grobu Answer Checked By - Marilyn (WPSolving Volunteer)