Issue
I am writing C program that gets a file and then uses grep to parse the file and print out text between HTML tags. Using:
grep -o '<font color="#FFCC00" SIZE=+1>' PROJECTS.HTML
I get a correct output of 41 instances of that tag.
Same with grep -o '</font>' PROJECTS.HTML
.
Is is not possible to use grep to extract what is between the tags? For example using: grep -o '(<font color="#FFCC00" SIZE=+1>).*?(</font>)' PROJECTS.HTML
.
Here is sample of the HTML:
<li>
<A HREF="./CS448/marbles_2020_1.tar" target="main">
<font color="#FFCC00" SIZE=+1>
Marbles - A marbles game (V2020) written on OPENGL for Linux.
</font>
</A>
</li>
<li>
<A HREF="./CS448/marbles_2020_2.tar" target="main">
<font color="#FFCC00" SIZE=+1>
Marbles - A marbles game (V2020) written on OPENGL for Linux.
</font>
</A>
</li>
</ul>
Solution
A million years ago, I would have used something like:
/<font color="#FFCC00" SIZE=+1>/,/<\/font>/p
as a script for sed
...
(This goes back a long way... I may be wrong about bits of that.)
Seems both upper & lower case has been used for attributes. This could be a problem...
IF, as you wrote, this is all that is in the file, use inverse thinking and "grep out" the rows/tags you don't want. Something like:
grep -v "<li>" -v "</li>"...
Just an idea...
Answered By - Fe2O3 Answer Checked By - Katrina (WPSolving Volunteer)