Troubleshooting Kubernetes: CrashLoopBackOff
In my day-to-day work, I help and support teams to shift on containers and cloud technologies, and I could observe that Kubernetes can be overwhelming for newcomers.
One of the reasons of Kubernetes complexity is because troubleshooting what went wrong can by difficult if you don’t know where to look and you need to often look in more than one place.
I hope this series of blog posts about troubleshooting Kubernetes will help people in their journey.
- Troubleshooting Kubernetes: ImagePullBackOff
- Troubleshooting Kubernetes: Pods in Pending State
- Troubleshooting Kubernetes: CrashLoopBackOff
Troubleshooting Pods in CrashLoopBackOff State
You deployed your deployment (statefulset or other) and the underlying pod(s) is in
But what does this mean? This means your pod is in a
starting, crashing loop.
First, let’s use the
kubectl describe command.
This will provide additional information to you on the pod. The output can be long, so let’s jump to the
> kubectl get pod NAME READY STATUS RESTARTS AGE crashing-pod 0/1 CrashLoopBackOff 2 (17s ago) 48s > kubectl describe pod crashing-pod
> kubectl describe pod crashing-pod ... ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 23s default-scheduler Successfully assigned default/crashing-pod to worker-node01 Normal Pulled 23s kubelet Successfully pulled image "alpine" in 875.741934ms Normal Pulling 18s (x2 over 24s) kubelet Pulling image "alpine" Normal Created 17s (x2 over 23s) kubelet Created container crashing-pod Normal Started 17s (x2 over 23s) kubelet Started container crashing-pod Normal Pulled 17s kubelet Successfully pulled image "alpine" in 1.300456734s Warning BackOff 12s kubelet Back-off restarting failed container
From the events section, we can see that the pod was created, started and then it falls into this
Warning BackOff 12s kubelet Back-off restarting failed container
This tells us that the container fails and Kubernetes is restarting it.
This most likely means your container exited. As you may know, the container should keep
pid 1 running or it will exit.
Kuberntes will keep trying to restart it on an
Exponential backoff model. You can see the
Restart counter value increase as Kubernetes restarts the container.
> kubectl get pods NAME READY STATUS RESTARTS AGE crashing-pod 0/1 CrashLoopBackOff 6 (3m47s ago) 10m
At this point, we know our pod is stuck in a starting, exiting loop, let’s now get the logs of the pod.
> kubectl logs crashing-pod Hello World Something bad is happening...
To keep the example simple, we just ouptut some text and then exiting the container.
However, on your real application you will hopefully have some useful logs to tell you why it is exiting or at least a clue.
An application misbehaving is one of the most common answer for the
Another possibility is that the
Liveness probe does not returning a successful status and make the pod crashing.
You will have the same
CrashLoopBackOff state but the
describe command will give you a different output.
NAME READY STATUS RESTARTS AGE liveness-fails-pod 0/1 CrashLoopBackOff 4 (12s ago) 3m27s
Let’s check the
... ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m56s default-scheduler Successfully assigned default/liveness-fails-pod to worker-node01 Normal Pulled 3m55s kubelet Successfully pulled image "alpine" in 883.359685ms Normal Pulled 3m17s kubelet Successfully pulled image "alpine" in 857.272018ms Normal Created 2m37s (x3 over 3m55s) kubelet Created container liveness-fails-pod Normal Started 2m37s (x3 over 3m55s) kubelet Started container liveness-fails-pod Normal Pulled 2m37s kubelet Successfully pulled image "alpine" in 882.126701ms Warning Unhealthy 2m29s (x9 over 3m53s) kubelet Liveness probe failed: Get "http://192.168.87.203:8080/healthz": dial tcp 192.168.87.203:8080: connect: connection refused Normal Killing 2m29s (x3 over 3m47s) kubelet Container liveness-fails-pod failed liveness probe, will be restarted Normal Pulling 119s (x4 over 3m56s) kubelet Pulling image "alpine"
This two events are the one we are looking for.
Warning Unhealthy 2m29s (x9 over 3m53s) kubelet Liveness probe failed: Get "http://192.168.87.203:8080/healthz": dial tcp 192.168.87.203:8080: connect: connection refused Normal Killing 2m29s (x3 over 3m47s) kubelet Container liveness-fails-pod failed liveness probe, will be restarted
liveness probe keeps failing, so Kubernetes is killing and restarting the pod.
This can happen when we have either incorrectly configured the liveness probe or our application is not working.
We should start by checking one and then the other.
... ... livenessProbe: httpGet: path: /healthz port: 8080 ... ...
I hope this was helpful!