Dear Maintainer,
When QUIC (HTTP/3) is enabled, connections will sometimes fail after
nginx is reloaded. The symptom is slow site loading and/or client
fallback to TCP (HTTP/2 or HTTP/1.1). Testing with curl may result in
e.g.:
% curl --http3-only https://example.org
curl: (7) QUIC connection has been shut down
An examination suggests that the problem stems from nginx worker
processes holding QUIC UDP sockets while they are shutting down. If the
worker process is handling a long-lived TCP session (e.g. websocket) for
another http server then the process may linger for an indefinite amount
of time, during which any QUIC UDP packets delivered to the process will
go unanwsered.
Here is a test case:
server {
listen 443 quic reuseport default_server;
server_name _;
ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
location / {
return 200 "OK\n";
}
}
server {
listen 80 default_server;
server_name _;
location / {
proxy_pass http://localhost:8080;
proxy_read_timeout 1h;
}
}
Ensure a freshly started nginx:
# systemctl restart nginx.service
Simulate the proxy destination in a separate terminal:
% nc -l 8080
Demonstrate working QUIC:
% curl --http3-only --insecure https://127.0.0.1
OK
% curl --http3-only --insecure https://127.0.0.1
OK
% curl --http3-only --insecure https://127.0.0.1
OK
% curl --http3-only --insecure https://127.0.0.1
OK
Initiate a long-running TCP session in a separate terminal:
% curl http://127.0.0.1
Reload nginx:
# systemctl reload nginx.service
Demonstrate the problem:
% curl --http3-only --insecure https://127.0.0.1
OK
% curl --http3-only --insecure https://127.0.0.1
curl: (7) QUIC connection has been shut down
% curl --http3-only --insecure https://127.0.0.1
OK
% curl --http3-only --insecure https://127.0.0.1
curl: (7) QUIC connection has been shut down
Confirm the QUIC listening socket is held by a worker process that is
shutting down:
# ss -ulnpH 'sport = 443'
UNCONN 0 0 0.0.0.0:443 0.0.0.0:* users:(("nginx",pid=258918,fd=7),("nginx",pid=258917,fd=7),("nginx",pid=257943,fd=7),("nginx",pid=257942,fd=7))
UNCONN 0 0 0.0.0.0:443 0.0.0.0:* users:(("nginx",pid=258918,fd=5),("nginx",pid=258917,fd=5),("nginx",pid=257943,fd=5),("nginx",pid=257942,fd=5))
% ps 257943
PID TTY STAT TIME COMMAND
257943 ? S 0:00 nginx: worker process is shutting down
When all the long-running TCP sessions are ended and the lingering
worker processes do finally shut down, the problem goes away.
Similar reports may or may not be related:
- https://github.com/nginx/nginx/issues/425
- https://github.com/nginx/nginx/issues/1399
I am not certain how to fix, though perhaps the listening UDP sockets
need to be closed as soon as they are no longer being used when a worker
process starts shutting down.