ResNet-50 takes 60 seconds to load. The cluster didn't know that.

The constraint

I started the machine this morning and the cluster was stopped. kubectl get pods refused the connection entirely. After minikube start, the ingress controller spent several minutes failing to acquire its etcd leader lease. I ran the load test too early.

50 requests. 50 failures. Every single one a socket drop.

total cluster failure

10:37:08 │ INFO │ Progress:  5/50 (10%) —  0 ok,  5 err
10:37:08 │ INFO │ Progress: 10/50 (20%) —  0 ok, 10 err
10:37:10 │ INFO │ Progress: 15/50 (30%) —  0 ok, 15 err
...
10:37:16 │ INFO │ Progress: 50/50 (100%) — 0 ok, 50 err

║ Successes............................... 0     ║
║ Failures................................ 50    ║
║ Success Rate............................ 0.0%  ║
║ Error Code Breakdown:                         ║
║   HTTP N/A: 50 occurrence(s)                  ║

The ingress controller logs confirmed the problem: etcd lease elections were timing out during the restart window.

etcd lease timeout

E0608 07:25:56.623698  6 leaderelection.go:452] "Error retrieving lease lock"
  err="Get https://10.96.0.1:443/.../ingress-nginx-leader: context deadline exceeded"

Once the control plane stabilised, I ran the test again. 49 of 50 succeeded. One HTTP 502 slipped through.

one 502 still leaking

║ Successes............................... 49   ║
║ Failures................................ 1     ║
║ Success Rate............................ 98.0% ║
║ Error Code Breakdown:                         ║
║   HTTP 502: 1 occurrence(s)                   ║

What happened

I matched the 502 to a specific pod. The ingress access log pointed at 10.244.0.57.

502 traced to pod

10.244.0.1 - - [08/Jun/2026:07:53:40 +0000] "POST /predict HTTP/1.1" 502 150
  [default-simple-model-api-service-8000] [] 10.244.0.57:8000 0 2.757 502

kubectl get pods -o wide matched that IP to simple-model-api-deployment-56b5cd5474-w54kn. I pulled its logs.

59 second startup gap

[2026-06-08 07:24:01 +0000] [1]  [INFO] Starting gunicorn 22.0.0
[2026-06-08 07:24:02 +0000] [10] [INFO] Waiting for application startup.
[2026-06-08 07:24:02 +0000] [11] [INFO] Waiting for application startup.
[2026-06-08 07:25:01 +0000] [10] [INFO] Application startup complete.
[2026-06-08 07:25:01 +0000] [11] [INFO] Application startup complete.

The workers started at 07:24:02 and finished loading at 07:25:01. That is 59 seconds. PyTorch loads the full ResNet-50 weights at startup. The pod was sitting in the Running state but not actually ready to serve traffic. The rolling update sent a request to it anyway, and it returned nothing.

The cluster had no startup probe. Kubernetes had no way to know the difference between “container is running” and “model is loaded.”

The pods also had no memory limits. Five pods initialising ResNet-50 in parallel puts significant pressure on the host. One misconfigured rollout and the whole machine runs out of memory.

The resolution

I enabled the metrics server, then applied three manifests: a ConfigMap for environment variables, an updated Deployment with resource boundaries and lifecycle probes, and an HPA.

manifests applied clean

minikube addons enable metrics-server
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/hpa.yaml

The resource boundaries:

Setting	Value	Reason
`requests.memory`	1Gi	Reserves room for PyTorch weights at load time
`limits.memory`	1.5Gi	Caps growth so parallel startups don’t OOM host
HPA CPU target	70%	Triggers scale-out during inference load
HPA memory target	80%	Triggers scale-out if weight caching grows
Min replicas	3	Keeps a baseline for the HPA to work with
Max replicas	10	Bounds the cluster to what the host can support

The startup probe gives each pod 90 seconds to pass its health check before Kubernetes routes traffic to it. Readiness and liveness probes run on the same /health endpoint after that.

I rebuilt the image inside the Minikube Docker daemon so the cluster picked it up without a registry pull, then triggered a rolling restart and watched it finish cleanly.

rebuild and rollout clean

eval $(minikube docker-env)
docker build -t simple-model-api:latest .
kubectl rollout restart deployment/simple-model-api-deployment
kubectl rollout status deployment/simple-model-api-deployment --timeout=300s

rollout confirmed

deployment "simple-model-api-deployment" successfully rolled out

The HPA confirmed both metrics targets were within range after the rollout.

HPA metrics healthy

$ kubectl get hpa simple-model-api-hpa
NAME                   REFERENCE                                TARGETS                        MINPODS   MAXPODS   REPLICAS
simple-model-api-hpa   Deployment/simple-model-api-deployment   cpu: 1%/70%, memory: 73%/80%   3         10        5

A port-forward smoke test confirmed the application was healthy end to end.

health check 200

$ curl http://localhost:8080/health
{"success":true,"status":"healthy","model":"ResNet-50","version":"1.0.0"}

The cluster now knows ResNet-50 is slow to wake up, and it waits.