Issue
I have an HTML file with <p>
, <h1>
, <h2>
, and <h3>
tags. I need to make the <p>
tags to be in braces, and if they’re subsequent <p>
tags on each line, the braces should begin at the first instance, and the closing brace at the closing </p>
tag, even if it’s a few lines down.
Example content:
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>
<h1>Heading 1</h1>
<p>Paragraph 4</p>
<h3>Heading 2</h3>
Desired output:
{<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>}
<h1>Heading 1</h1>
{<p>Paragraph 4</p>}
<h3>Heading 2</h3>
Note in the desired output that the closing brace is when the lines swap to an <h1>
. The number of subsequent <p>
tags can be anywhere from 2, to 20.
My current sed solution is just replacing opening <p>
tags with an open brace, and closing </p>
tags with a closing brace.
sed 's|<p>|{|g' | sed 's|</p>|}|g'
Unfortunately this does it line by line, and what I need is to match multiple lines, and ignore closing/opening tags if they're followed by another <p>
tag so that they're lumped together.
I've been unable to find a solution for this yet. I'm happy to use perl, awk, sed, whatever gets the job done. It just seems like I need a way to recognise this particular pattern.
Edit: Ed Morton's solution below worked perfectly for me.
Solution
$ cat tst.awk
/^<p/ {
ps = (ps == "" ? "" : ps ORS) $0
next
}
ps != "" { print "{" ps "}"; ps="" }
{ print }
END { if (ps != "") print "{" ps "}" }
$ awk -f tst.awk file
{<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>}
<h1>Heading 1</h1>
{<p>Paragraph 4</p>}
<h3>Heading 2</h3>
Answered By - Ed Morton