Issue
I am trying to get the meta title of some website...
some people write title like
`<title>AllHeart Web INC, IT Services Digital Solutions Technology
</title>
`
`<title>AllHeart Web INC, IT Services Digital Solutions Technology</title>`
`<title>
AllHeart Web INC, IT Services Digital Solutions Technology
</title>`
some like more ways... my current focus on above 3 ways...
I wrote a simple code, it only capture 2nd way of title written, but i am not sure how can I grep the other ways,
`curl -s https://allheartweb.com/ | grep -o '<title>.*</title>'`
I also made a code (very bad i guess)
where i can grep number of line like
`
% curl -s https://allheartweb.com/ | grep -n '<title>'
7:<title>AllHeart Web INC, IT Services Digital Solutions Technology
% curl -s https://allheartweb.com/ | grep -n '</title>'
8:</title>
`
and store it and run loop to get title item... which i guess a bad idea...
any help I can get all possible of getting title?
Solution
Try this:
curl -s https://allheartweb.com/ | tr -d '\n' | grep -m 1 -oP '(?<=<title>).+?(?=</title>)'
You can remove newlines from HTML via tr
because they have no meaning in the title. The next step returns the first match of the shortest string enclosed in <title>
</title>
.
This is quite a simple approach of course. xmllint
would be better but that's not available to all platforms by default.
Answered By - Newerth Answer Checked By - Marilyn (WPSolving Volunteer)