Saturday, October 16, 2021

[SOLVED] How does shell select content within the keyword range?

October 16, 2021 awk, grep, sed, shell

Issue

This is an HTML file containing a large number of <section>... </section> content in an HTML file, which has the following format.

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<section>
<div>
<header><h2>This is a title (RfQVthHm)</h2></header>
More HTML codes...
</div>
</section>

<section>
<div>
<header><h2>This is a title (UaHaZWvm)</h2></header>
More HTML codes...
</div>
</section>

<section>
<div>
<header><h2>This is a title (vxzbXEGq)</h2></header>
More HTML codes...
</div>
</section>

</body>
</html>

I need to extract the second <section>...</section> content.

This is the expected output.

<section>
<div>
<header><h2>This is a title (UaHaZWvm)</h2></header>
More HTML codes...
</div>
</section>

I noticed that I can look for the UaHaZWvm character first (and 2 lines ahead) until I encounter the next </section>.

OP's efforts(mentioned in comments): grep -o "hi.*bye" file

Can this be done with awk, sed or grep tools please?

Solution

With your shown samples, could you please try following. Written and tested in GNU awk, should work in any awk.

awk '
/^<\/section>/{
  if(found1==2 && found2==1){
    print val
    exit
  }
  found2++
}
/<section>/{
  found1++
}
found1==2{
  val=(val?val ORS:"")$0
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                             ##Starting awk program from here.
/^<\/section>/{                   ##Checking condition if line starts from </section> here.
  if(found1==2 && found2==1){     ##Checking condition if found1 is 2 AND found2 is 1 then do following.
    print val                     ##printing val here.
    exit                          ##exiting from program from here.
  }
  found2++                        ##Increasing found2 with 1 here.
}
/<section>/{                      ##Checking condition if line has <section> then do following.
  found1++                        ##Increasing found1 with 1 here.
}
found1==2{                        ##Checking if found1 is 2 then do following.
  val=(val?val ORS:"")$0          ##Creating val and keep adding lines into it.
}
'

Answered By - RavinderSingh13

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 16, 2021

[SOLVED] How does shell select content within the keyword range?

Issue

Solution

Popular Posts

Labels