Issue
I am looking for a lightweight webserver-setup acting as an REST-API cache for multiple external databases.
Fixed requirements for the project: use Python3 on CentOS7
Guideline: Create a lightweight webserver, which needs to be robust and fast
Use-case scenario: During service-start I need to cache data from 10 external database-servers in RAM. Each server has 5 tables with ca. 100k rows each. So in sum I need to merge the data into 5 tables with ca. 1 mio. entries. Every 10 mins I need to query the servers again to identify new/removed rows and update the cache. The webserver will receive requests to lookup a single entry from the cache, filtered by table and a given search-condition (like "field_1" ="value_X").
Expected web-load: avg. 1 request/sec., (rare) peak-load ca. 100 requests/sec.
Now my question to the above scenario:
- I can get the data form the DB-servers as json, xml or csv. Which format is the recommended one for the use case (fast "inserts" into a table with 1 mio rows)?
- How shoud I store the data in the memory? pandas dataframes?
- In sum, what is the recommended framework for all this? pandas, gunicorn, supervisor & nginx?
Many thanks for any input.
Solution
To deserialize your data, CSV will be the fastest method in most cases. This allows you to read multiple lines in different threads and reduces complexity.
To store the data, I would recommend going either with the most performant solution, which likely is using an existing (No)SQL database implementation, or going the programatically easier way using SQLite in-memory database. Pandas is better for analysis, while I understand that you want to get a similar functionality like a normal DBMS to just fetch data. SQLite is faster (and easier) than Pandas for those use cases.
In your use case, I would recommend using the FastAPI library for servicing the API automatically in multiple threads within Python. You do not need another webserver in front of it, unless you want to do caching there. The script can access the in-memory database or dedicated DBMS application from within those threads. Using supervisor depends on your use case - inside a Container or if the script runs as a service it will not be needed.
Answered By - James Cameron Answer Checked By - Terry (WPSolving Volunteer)