Issue
I have two url one is working url another one is page deleted url.working url is fine but for page deleted url instead of getting the exact page content wget receives 404
import os
def curl(url):
data = os.popen('wget -qO- %s '% url).read()
print (url)
print (len(data))
#print (data)
curl("https://www.reverbnation.com/artist_41/bio")
Output:
https://www.reverbnation.com/artist_41/bio
80067
Page Deleted url
import os
def curl(url):
data = os.popen('wget -qO- %s '% url).read()
print (url)
print (len(data))
#print (data)
curl("https://www.reverbnation.com/artist_42/bio")
output:
https://www.reverbnation.com/artist_42/bio
0
I get length as 0 but live page has some content in it
How to receive the exact content in wget or curl
Solution
wget has a switch called "--content-on-error":
--content-on-error
If this is set to on, wget will not skip the content
which outputs the more information whenever the server responds with an HTTP status code that indicates the error.
So just add it to your code and you will have the "content" of the 404 pages too:
import os
def curl(URL):
data = os.popen('wget --content-on-error -qO- %s '% url).read()
print (URL)
print (len(data))
#print (data)
curl("https://www.reverbnation.com/artist_42/bio")
Answered By - Jadi Answer Checked By - Marilyn (WPSolving Volunteer)