The heartbeat murder: diagnostic of a 502 bad gateway

The symptom

I ran a load simulation: 100 concurrent ResNet-50 inference requests, 2 waves, 50 requests per wave, concurrency 4, across a local multi-pod Kubernetes cluster. 8 requests failed.

The breakdown:

Status	Count
HTTP 502 Bad Gateway	7
HTTP N/A (socket drop)	1

The cluster did not crash. The VM stayed up.

The diagnostic path

Layer 1: the ingress controller

I checked the ingress-nginx-controller logs first.

upstream connection killed

[error] upstream prematurely closed connection while reading
response header from upstream,
upstream: "http://10.244.0.44:8000/predict"

Not a timeout (HTTP 504). The application pod at 10.244.0.44 killed the TCP connection mid-request. Successful requests through that same pod were taking up to 9.74 seconds.

Layer 2: the pod logs

I ran kubectl get pods -o wide and matched 10.244.0.44 to simple-model-api-deployment-6d694f7ddc-clvw7.

RESTARTS: 0 — no OOM kernel kill. I checked the pod logs directly:

workers killed by heartbeat

[CRITICAL] WORKER TIMEOUT (pid:11)
[ERROR] Worker (pid:11) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 16
[CRITICAL] WORKER TIMEOUT (pid:10)
[ERROR] Worker (pid:10) was sent SIGKILL!

What happened

Gunicorn runs as a master process. It manages two Uvicorn ASGI workers, pid:10 and pid:11. Every 30 seconds, it checks that each worker is still responding.

PyTorch’s ResNet-50 forward pass is CPU-bound. Python’s GIL held the thread completely during the matrix computation. The workers could not respond to Gunicorn’s heartbeat check.

Gunicorn assumed they were deadlocked and sent SIGKILL.

NGINX was holding an open socket waiting for a response header from a process that no longer existed. It returned 502.

The fix

gunicorn main:app \
  --workers 1 \
  --worker-class uvicorn.workers.UvicornWorker \
  --timeout 120

Two changes:

Setting	Before	After	Reason
`--timeout`	30s	120s	PyTorch inference takes up to 10s per request
`--workers`	default	1	Let Kubernetes handle horizontal scaling, not Gunicorn

Web timeout defaults assume fast I/O. PyTorch does not do fast I/O.