Issue
I want to get the source link of the first image that appears in the Bing image search results for a specified search term.
I am currently using this command, but get no output:
curl -s "https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover" | grep -o '<a class="thumb" target="_blank" href="[^"]*'
Running only curl -s "https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover"
displays HTML code of the page.
What am I doing wrong?
Solution
should generally avoid parsing HTML with regex, which bobnice explains better than I can, here: https://stackoverflow.com/a/1732454/1067003
for example PHP can parse HTML with its DOMDocument API:
curl -s 'https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover' | php -r '$html = stream_get_contents(STDIN);$domd=new DOMDocument();@$domd->loadHTML($html);$xp = new DOMXPath($domd);var_dump($xp->query("//a[@data-hookid='\''pgdom'\'']")->item(0)->getAttribute("href"));'
prints
string(34) "https://pxhere.com/en/photo/609263"
the source of the first image.
Answered By - hanshenrik Answer Checked By - Timothy Miller (WPSolving Admin)