Issue
I have a VPS running Nginx, Django, Postgres and a Golang microservice in a Docker compose environment and recently I've noticed it's consistently hitting 100% CPU utilization and not working anymore. I suspect this may be due to a DDoS attack or weird gunicorn behavior.
VPS OS: Ubuntu 22.04.2 LTS
Observations:
The high CPU usage started around yesterday (24hrs ago).
Steps Taken:
Setup and config ufw, and a digitalocean firewall.
NGINX Logs
...
89.44.9.51 - - [29/Sep/2023:05:09:58 +0000] "OPTIONS /webclient/api/MyPhone/session HTTP/1.1" 444 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 3CXDesktopApp/18.13.959 Chrome/112.0.5615.165 Electron/24.3.0 Safari/537.36" "-"
89.44.9.51 - - [29/Sep/2023:05:10:01 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:07 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:12 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:17 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:22 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:27 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:32 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
89.44.9.51 - - [29/Sep/2023:05:10:38 +0000] "GET /provisioning/5lbr5h6kse0q/TcxProvFiles/3cxProv_YU8SR32OGF200.xml HTTP/1.1" 444 0 "-" "electron-fetch/1.0 electron (+https://github.com/arantes555/electron-fetch)" "-"
...
TOP
15284 root 20 0 298336 133468 18568 S 90.7 3.3 4:06.24 gunicorn
15477 lxd 20 0 216928 18104 15104 R 8.6 0.5 0:03.21 postgres
When accessing the Domain, it straight hits the CPU limits. I was implementing some new features which worked like a charm (locally) but I can't think of that would do something like that. I actually also rolled back everything and had the same issue.
I am kinda stuck where to look at.
Would appreciate any hint.
PS: If you need more information, let me know in the comments pls. I didn't want to prolong this.
Thank you in advance!
EDIT with additional information
docker logs 7e3e93cda248 -f
Collect static files
254 static files copied to '/staticfiles', 756 post-processed.
Apply database migrations
System check identified some issues:
WARNINGS:
?: (urls.W005) URL namespace 'v1' isn't unique. You may not be able to reverse all URLs in this namespace
Operations to perform:
Apply all migrations: admin, app, auth, contenttypes, sessions
Running migrations:
No migrations to apply.
System check identified some issues:
WARNINGS:
?: (urls.W005) URL namespace 'v1' isn't unique. You may not be able to reverse all URLs in this namespace
No changes detected in app 'app'
System check identified some issues:
WARNINGS:
?: (urls.W005) URL namespace 'v1' isn't unique. You may not be able to reverse all URLs in this namespace
Operations to perform:
Apply all migrations: admin, app, auth, contenttypes, sessions
Running migrations:
No migrations to apply.
[2023-09-29 04:52:38 +0000] [10] [INFO] Starting gunicorn 21.2.0
[2023-09-29 04:52:38 +0000] [10] [INFO] Listening at: http://0.0.0.0:8000 (10)
[2023-09-29 04:52:38 +0000] [10] [INFO] Using worker: gthread
[2023-09-29 04:52:38 +0000] [11] [INFO] Booting worker with pid: 11
Not Found: /favicon.ico
gunicorn --pythonpath . app.wsgi:application --bind 0.0.0.0:8000 --timeout 120 --threads=3
2 vCPUs
Solution
So obviously it was an application level problem, I could have guessed it when I saw that Postgres PID also consumes a lot of CPU.
There was an unperformant ORM operation that somehow ends in a timeout.
Answered By - Softwareentwicklung Freelancer Answer Checked By - Cary Denson (WPSolving Admin)