Thursday, April 7, 2022

[SOLVED] Extract content from div with awk/grep

Issue

Assuming following html code.

<div class='requirement'>
<div class='req-title'>
The quick brown fox jumps over the lazy dog
</div>
</div>

I want to extract The quick brown fox jumps over the lazy dog using tools like awk or sed, I'm pretty sure it can be done.

I know html parser is the right tools for this job, but this is the only time I'll be dealing with html content.


Solution

Assuming the part you want to print is a single line:

$ awk 'f{print; exit} $0=="<div class=\047req-title\047>"{f=1}' file
The quick brown fox jumps over the lazy dog

otherwise:

$ awk 'f{if ($0=="</div>") exit; print} $0=="<div class=\047req-title\047>"{f=1}' file
The quick brown fox jumps over the lazy dog


Answered By - Ed Morton
Answer Checked By - David Marino (WPSolving Volunteer)