Issue
Using this command:
sed -n '/<article class.*article--nyheter/,/<\/article>/p' news2.html > onlyArticles.html
I get all these articles tags in my html document. They are about 50+ articles.
Sample input:
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
I just want x number of articles. Like just top 2 articles.
Output:
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
<article class="article column large-12 small-12 article--nyheter">
... variable number of lines of dat
</article>
This is just an example. What I am trying to achieve is to select only (x) number of matching nodes.
Is there any way to do it? Cannot just use simple head
or tail
as I need to extract the matching elements not just some x amount of lines.
Solution
xmllint
+ xpath
can be used requesting tags by position
xmllint --html --recover --xpath '//article[position()<=2]' tmp.html 2>/dev/null
Answered By - LMC