Issue
ipinfo.io provides information about the website/server corresponding to an IP address, either by entering it on their website or by sending a request to them via the curl command line utility, e.g:
$ curl https://ipinfo.io/172.217.169.6
outputs, in JSON format:
{
"ip": "172.217.169.68",
"hostname": "lhr48s09-in-f4.1e100.net",
"city": "London",
"region": "England",
"country": "GB",
"loc": "51.5085,-0.1257",
"org": "AS15169 Google LLC",
"postal": "EC1A",
"timezone": "Europe/London",
"readme": "https://ipinfo.io/missingauth"
}
What I'm trying to eventually do is do this in Python and store this result as a JSON object. I believe the following code, using pycURL should produce the same output:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://ipinfo.io/172.217.169.6")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close
body = buffer.getvalue()
print(body.decode('iso-8859-1'))
i.e, write the same JSON string into the buffer.
However, it instead prints massive HTML output, i.e I suspect the HTML from the actual page pycURL is requesting data from, rather than the JSON data. e.g:
<!DOCTYPE html>
<html>
<head>
<title>
172.217.169.6 IP Address Details
- IPinfo.io</title>
<meta charset="utf-8">
<meta name="apple-itunes-app" content="app-id=917634022">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, user-scalable=no">
<meta name="description" content="Full IP address details for 172.217.169.6 (AS15169 Google LLC) including geolocation and map, hostname, and API details.">
<link rel="manifest" href="/static/manifest.json">
<link rel="icon" sizes="48x48" href="/static/deviceicons/android-icon-48x48.png">
...
</html>
Basically, how can I get pycURL to also receive this JSON data?
I tried comparing the verbose outputs of both, and I couldn't figure out why they behave differently, only that the content-type field is different; "application/json" for curl and "text/html" for pycURL, which explains the different outputs. At the risk of making this post extremely long-winded, I've provided them below also:
curl (command line) verbose output:
$ curl -v https://ipinfo.io/172.217.169.6
* Trying 34.117.59.81:443...
* TCP_NODELAY set
* Connected to ipinfo.io (34.117.59.81) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=ipinfo.io
* start date: Jul 10 20:18:59 2021 GMT
* expire date: Oct 8 21:18:59 2021 GMT
* subjectAltName: host "ipinfo.io" matched cert's "ipinfo.io"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55a887a40e10)
> GET /172.217.169.6 HTTP/2
> Host: ipinfo.io
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200
< access-control-allow-origin: *
< x-frame-options: DENY
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< content-type: application/json; charset=utf-8
< content-length: 286
< date: Tue, 27 Jul 2021 21:03:50 GMT
< x-envoy-upstream-service-time: 1
< via: 1.1 google
< alt-svc: clear
<
{
"ip": "172.217.169.6",
"hostname": "lhr25s26-in-f6.1e100.net",
"city": "London",
"region": "England",
"country": "GB",
"loc": "51.5085,-0.1257",
"org": "AS15169 Google LLC",
"postal": "EC1A",
"timezone": "Europe/London",
"readme": "https://ipinfo.io/missingauth"
* Connection #0 to host ipinfo.io left intact
}
pycURL verbose output:
$ python3 ip_helper.py
* Trying 34.117.59.81:443...
* TCP_NODELAY set
* Connected to ipinfo.io (34.117.59.81) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=ipinfo.io
* start date: Jul 10 20:18:59 2021 GMT
* expire date: Oct 8 21:18:59 2021 GMT
* subjectAltName: host "ipinfo.io" matched cert's "ipinfo.io"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x19d65c0)
> GET /172.217.169.6 HTTP/2
Host: ipinfo.io
user-agent: PycURL/7.43.0.6 libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
accept: */*
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200
< access-control-allow-origin: *
< x-frame-options: DENY
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< content-type: text/html; charset=utf-8
< content-length: 44645
< date: Tue, 27 Jul 2021 21:07:50 GMT
< x-envoy-upstream-service-time: 13
< via: 1.1 google
< alt-svc: clear
<
* Connection #0 to host ipinfo.io left intact
<!DOCTYPE html>
<html>
<head>
<title>
172.217.169.6 IP Address Details
- IPinfo.io</title>
<meta charset="utf-8">
<meta name="apple-itunes-app" content="app-id=917634022">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, user-scalable=no">
<meta name="description" content="
Full IP address details for 172.217.169.6 (AS15169 Google LLC) including geolocation and map, hostname, and API details.
">
<link rel="manifest" href="/static/manifest.json">
<link rel="icon" sizes="48x48" href="/static/deviceicons/android-icon-48x48.png">
...
</html>
Thank you for your time
Solution
From the docs:
We try to automatically detect when someone wants to call our API versus view our website, and then we send back the appropriate JSON response rather than HTML. We do this based on the user agent for known popular programming languages, tools, and frameworks. However, there are a couple of other ways to force a JSON response when it doesn't happen automatically. One is to add /json to the URL, and the other is to set an Accept header to application/json
So it looks like there's three different ways to get JSON back using pycurl
.
- Append
/json
to your URL:
c.setopt(c.URL, "https://ipinfo.io/172.217.169.6/json")
- Set your
Accept
header to only allow JSON responses:
c.setopt(c.HTTPHEADER, ["Accept: application/json"])
- Set your
User-Agent
header to make the web site think it's talking tocurl
instead ofpycurl
:
c.setopt(c.HTTPHEADER, ["User-Agent: curl"])
Answered By - Woodford