Issue
Trying to configure Nginx for two purposes:
- Reverse proxy to redirect requests to local tomcat server (port 443 to 10443 listening by to mcat)
- Mirror requests to backend server for analysing purposes
Since we encountered very low performance using the default configuration and the mirror directive, we decided to try just with reverse proxy to check if there is an impact on the server and indeed seems like nginx is capping the traffic by almost half (we are using Locust and Jmeter as load tools)
Nginx version: 1.19.4
Worked through 10-tips-for-10x-application-performance & Tuning NGINX for Performance with no avail. The machine nginx & tomcat runs on should be strong enough (EC2 c5.4XLarge) and we don't see lack in resources but more of network capping. Very high count of TIME_WAIT connections (20k-40k)
From the machine perspective:
- Increased net port range (1024 65300)
- Lowered tcp_fin_timeout (15ms)
- increased max FD to the max
Nginx perspective (adding nginx.conf after):
- keepalive_requests 100000; keepalive_timeout 1000;
- worker_processes 10 (16 is cpu count)
- worker_connections 3000;
- worker_rlimit_nofile 100000;
nginx.conf:
user nginx;
worker_processes 10;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
worker_rlimit_nofile 100000;
events {
worker_connections 3000;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$host" sn="$server_name" '
'rt=$request_time '
'ua="$upstream_addr" us="$upstream_status" '
'ut="$upstream_response_time" ul="$upstream_response_length" '
'cs=$upstream_cache_status' ;
keepalive_requests 100000;
keepalive_timeout 1000;
ssl_session_cache shared:SSL:10m;
sendfile on;
#tcp_nopush on;
#gzip on;
include /etc/nginx/conf.d/*.conf;
upstream local_host {
server 127.0.0.1:10443;
keepalive 128;
}
server {
listen 443 ssl;
ssl_certificate /etc/ssl/nginx/crt.pem;
ssl_certificate_key /etc/ssl/nginx/key.pem;
location / {
# mirror /mirror;
proxy_set_header Host $host;
proxy_pass https://local_host$request_uri;
}
# Mirror configuration
location = /mirror {
internal;
proxy_set_header Host test-backend-dns;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_connect_timeout 3s;
proxy_read_timeout 100ms;
proxy_send_timeout 100s;
proxy_pass https://test-backend-ip:443$request_uri;
}
}
}
Also monitor using Amplify agent, seems like connections count meets with the expected requests and connections, but the actual requests count is low. Amplify monitor output
Seems like a simple task for Nginx, but something is misconfigured. Thank you for your answers
Solution
After many attempts and ways to figure things out, we got to a conclusion the response time from the application was higher with nginx.
Our assumption and how we eventually overcome this issue, was the SSL Termination. This is an expensive operation, from both resources and time wise.
What we did was to have the nginx (which is more than capable of handling much higher load than what we hit it with, ~4k RPS) be responsible solely on the SSL Termination, and we changed the tomcat app configuration such that it listens to HTTP requests rather than HTTPS. This reduced dramatically the TIME_WAIT connections that were packing and taking important resources from the server.
Final configurations for nginx, tomcat & the kernel:
linux machine configuration:
- /proc/sys/net/ipv4/ip_local_port_range - set to 1024 65535
(allows more ports hence ---> more connections)
- sysctl net.ipv4.tcp_timestamps=1
(""..reduce performance spikes related to timestamp generation..")
- sysctl net.ipv4.tcp_tw_recycle=0
(This worked for us. Should be tested with/without tcp_tw_reuse)
- sysctl net.ipv4.tcp_tw_reuse=1
(Same as tw_recycle)
- sysctl net.ipv4.tcp_max_tw_buckets=10000
(self explanatory)
Redhat explanation for tcp_timeouts conf
Tomcat configuration:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="4000"
minSpareThreads="10"
/>
<!-- A "Connector" using the shared thread pool - NO SSL -->
<Connector executor="tomcatThreadPool"
port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
acceptCount="5000"
pollerThreadCount="16"
acceptorThreadCount="16"
redirectPort="8443"
/>
Nginx specific performance params configuration:
main directive:
- worker_processes auto;
- worker_rlimit_nofile 100000;
events directive:
- worker_connections 10000; (we think can be lower)
- multi_accept on;
http directive:
- keepalive_requests 10000;
- keepalive_timeout 10s;
- access_log off;
- ssl_session_cache shared:SSL:10m;
- ssl_session_timeout 10m;
Really helps to understand the two points of the equation: Nginx and tomcat.
We used jmx metrics to understand whats going on on tomcat along side prometheus metrics from our app. And Amplify agent to monitor nginx behavior.
Hope that helps to anyone.
Answered By - Almog Answer Checked By - Willingham (WPSolving Volunteer)