The heartbeat murder: diagnostic of a 502 bad gateway
Debugging PyTorch CPU-bound inference causing Gunicorn worker SIGKILL timeouts and HTTP 502 Bad Gateway responses.
The symptom
I ran a load simulation: 100 concurrent ResNet-50 inference requests, 2 waves, 50 requests per wave, concurrency 4, across a local multi-pod Kubernetes cluster. 8 requests failed.
The breakdown:
| Status | Count |
|---|---|
| HTTP 502 Bad Gateway | 7 |
| HTTP N/A (socket drop) | 1 |
The cluster did not crash. The VM stayed up.
The diagnostic path
Layer 1: the ingress controller
I checked the ingress-nginx-controller logs first.
[error] upstream prematurely closed connection while reading
response header from upstream,
upstream: "http://10.244.0.44:8000/predict"
Not a timeout (HTTP 504). The application pod at 10.244.0.44
killed the TCP connection mid-request. Successful requests through
that same pod were taking up to 9.74 seconds.
Layer 2: the pod logs
I ran kubectl get pods -o wide and matched 10.244.0.44 to
simple-model-api-deployment-6d694f7ddc-clvw7.
RESTARTS: 0 — no OOM kernel kill. I checked the pod logs directly:
[CRITICAL] WORKER TIMEOUT (pid:11)
[ERROR] Worker (pid:11) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 16
[CRITICAL] WORKER TIMEOUT (pid:10)
[ERROR] Worker (pid:10) was sent SIGKILL!
What happened
Gunicorn runs as a master process. It manages two Uvicorn ASGI
workers, pid:10 and pid:11. Every 30 seconds, it checks that
each worker is still responding.
PyTorch’s ResNet-50 forward pass is CPU-bound. Python’s GIL held the thread completely during the matrix computation. The workers could not respond to Gunicorn’s heartbeat check.
Gunicorn assumed they were deadlocked and sent SIGKILL.
NGINX was holding an open socket waiting for a response header
from a process that no longer existed. It returned 502.
The fix
gunicorn main:app \
--workers 1 \
--worker-class uvicorn.workers.UvicornWorker \
--timeout 120
Two changes:
| Setting | Before | After | Reason |
|---|---|---|---|
--timeout | 30s | 120s | PyTorch inference takes up to 10s per request |
--workers | default | 1 | Let Kubernetes handle horizontal scaling, not Gunicorn |
Web timeout defaults assume fast I/O. PyTorch does not do fast I/O.