Thursday, October 27, 2022

[SOLVED] Trying to make a POST request, works with cURL, get a 403 when using Python requests

Issue

I'm trying to get some JSON data from this API - https://ped.uspto.gov/api/queries

This cURL request works fine and returns what is expected:

curl -X POST "https://ped.uspto.gov/api/queries" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"searchText\":\"*:*\", \"fq\":[ \"totalPtoDays:[1 TO 99999]\", \"appFilingDate:[2005-01-01T00:00:00Z TO 2005-12-31T23:59:59Z]\" ], \"fl\":\"*\", \"mm\":\"100%\", \"df\":\"patentTitle\", \"facet\":\"true\", \"sort\":\"applId asc\", \"start\":\"0\"}"

I have this python script to do the same thing:

from requests.structures import CaseInsensitiveDict
import json

url = "https://ped.uspto.gov/api/queries"

headers = CaseInsensitiveDict()
headers["accept"] = "application/json"
headers["Content-Type"] = "application/json"

data = json.dumps({
   "searchText":"*:*",
   "fq":[
      "totalPtoDays:[1 TO 99999]",
      "appFilingDate:[2005-01-01T00:00:00Z TO 2005-12-31T23:59:59Z]"
   ],
   "fl":"*",
   "mm":"100%",
   "df":"patentTitle",
   "facet":"true",
   "sort":"applId asc",
   "start":"0"
})


resp = requests.post(url, headers=headers, data=data)

print(resp.status_code)

but it returns a 403 error code and the following response header:

   "Date":"Mon, 24 Oct 2022 16:13:58 GMT",
   "Content-Type":"text/html",
   "Content-Length":"919",
   "Connection":"keep-alive",
   "X-Cache":"Error from cloudfront",
   "Via":"1.1 d387fec28536c5aa92926c56363afe9a.cloudfront.net (CloudFront)",
   "X-Amz-Cf-Pop":"LHR50-P8",
   "X-Amz-Cf-Id":"RMd69prehvXNAl97mo0qyFtuBIiY8r9liIxcQEmbdoBV1zwXLhirXA=="

I'm at quite a loss at what to do, because I really don't understand what my Python is missing to replicate the cURL request.

Thanks very much.


Solution

I was interested in this. I got an account with uspto.gov and acquired an access key. Their other API's work well. But the PEDS API? I kept getting the Cloudflare Gateway Timeout 503 error. While I was on their website, I looked into the PEDS API, I could not load any link to a https://ped.uspto.gov page.

I called them and they gave me an email address. I got this reply:

The PEDS API was taken down, because repeated data mining was bringing the entire PEDS System down.

The PEDS Team is working on a solution to fix the PEDS API, so that it can be re-enabled.


I tried it using PHP.
Cloudflare has been causing a lot of problems for curl.
I got a timeout.
I may have gotten past the 403 Forbidden, but did not have credentials and so the server dropped the connection.

An HTTP 504 status code (Gateway Timeout) indicates that when CloudFront forwarded a request to the origin (because the requested object wasn't in the edge cache), one of the following happened: The origin returned an HTTP 504 status code to CloudFront. The origin didn't respond before the request expired.

Cloudflare 504.

AWS Cloudflare Curl Issues
bypassing CloudFlare 403
How to Fix Error 403 Forbidden on Cloudflare
403 Forbidden cloudflare




██████████████████████████████████████████████████████████████


This is a conversion from you curl.
The Content-Type:application/data is added by default when you send JSON data.
I do not know about your json_data.dump or you putting the JSON in parentheses.

import requests

headers = {
    'accept': 'application/json',
}

json_data = {
    'searchText': '*:*',
    'fq': [
        'totalPtoDays:[1 TO 99999]',
        'appFilingDate:[2005-01-01T00:00:00Z TO 2005-12-31T23:59:59Z]',
    ],
    'fl': '*',
    'mm': '100%',
    'df': 'patentTitle',
    'facet': 'true',
    'sort': 'applId asc',
    'start': '0',
}

response = requests.post('https://ped.uspto.gov/api/queries', headers=headers, json=json_data)


Answered By - Misunderstood
Answer Checked By - Mildred Charles (WPSolving Admin)