Issue
I'm trying to pragmatically gather information from a web page using cURL. The information I need is very basic, and the page is pretty basic.
When using cURL I receive a 503 error. When I visit the same page in a browser, same machine, the page loads great. I read this could be caused by the site requiring a cookie to be passed when it is queried. I've tried this, but admittedly, I could be doing it wrong (I snagged the cookie from the web browser's inspector).
curl --cookie "sessionId=.eJxrYKotZNQI5S9OLS7OzM-LT81LTMpJTfFmChVIzEktKolPzkhNzo4vycxNLWRKTkksSQUxueCMQuZQLvaHHGI82lqMp0KTCypLqrjiQ0OcuQpZNIMKWduCCtlCuUvyi-NLC0B6UgrZO0v1ACyMJy0:1dk8X0:WIgK35IaFa7RbCe7EqpSMtLjK9w" https://www.appannie.com/en/apps/ios/app/284815942/ -o /tmp/test.html
I'm a really basic user; with very rudimentary knowledge. There is a good chance I'm missing something obvious. I've gathered that the site I'm attempting to access using nginx, if this is an important caveat.
Solution
Some sites will disable curls user-agent and some disable browsing with missing headers. I tried below curl and it works great
curl 'https://www.appannie.com/en/apps/ios/app/284815942/' -H 'pragma: no-cache' -H 'dnt: 1' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: en-US,en;q=0.8' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'cache-control: no-cache' -H 'authority: www.appannie.com' --compressed
This is how the browser makes it and that is what you should try and replicate
Answered By - Tarun Lalwani Answer Checked By - Gilberto Lyons (WPSolving Admin)