Transmission 006 · 2026-06-05

The heartbeat murder: diagnostic of a 502 bad gateway

Debugging PyTorch CPU-bound inference causing Gunicorn worker SIGKILL timeouts and HTTP 502 Bad Gateway responses.

The symptom

I ran a load simulation: 100 concurrent ResNet-50 inference requests, 2 waves, 50 requests per wave, concurrency 4, across a local multi-pod Kubernetes cluster. 8 requests failed.

The breakdown:

StatusCount
HTTP 502 Bad Gateway7
HTTP N/A (socket drop)1

The cluster did not crash. The VM stayed up.


The diagnostic path

Layer 1: the ingress controller

I checked the ingress-nginx-controller logs first.

[error] upstream prematurely closed connection while reading
response header from upstream,
upstream: "http://10.244.0.44:8000/predict"

Not a timeout (HTTP 504). The application pod at 10.244.0.44 killed the TCP connection mid-request. Successful requests through that same pod were taking up to 9.74 seconds.


Layer 2: the pod logs

I ran kubectl get pods -o wide and matched 10.244.0.44 to simple-model-api-deployment-6d694f7ddc-clvw7.

RESTARTS: 0 — no OOM kernel kill. I checked the pod logs directly:

[CRITICAL] WORKER TIMEOUT (pid:11)
[ERROR] Worker (pid:11) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 16
[CRITICAL] WORKER TIMEOUT (pid:10)
[ERROR] Worker (pid:10) was sent SIGKILL!

What happened

Gunicorn runs as a master process. It manages two Uvicorn ASGI workers, pid:10 and pid:11. Every 30 seconds, it checks that each worker is still responding.

PyTorch’s ResNet-50 forward pass is CPU-bound. Python’s GIL held the thread completely during the matrix computation. The workers could not respond to Gunicorn’s heartbeat check.

Gunicorn assumed they were deadlocked and sent SIGKILL.

NGINX was holding an open socket waiting for a response header from a process that no longer existed. It returned 502.


The fix

gunicorn main:app \
  --workers 1 \
  --worker-class uvicorn.workers.UvicornWorker \
  --timeout 120

Two changes:

SettingBeforeAfterReason
--timeout30s120sPyTorch inference takes up to 10s per request
--workersdefault1Let Kubernetes handle horizontal scaling, not Gunicorn

Web timeout defaults assume fast I/O. PyTorch does not do fast I/O.

← Back to Transmissions