Debugging production issues can sometimes be a challenge in Kubernetes environments. One specific challenge is debugging containers that don’t contain a shell. You may have seen the following when troubleshooting an issue:
$ kubectl exec -it -n kube-system coredns-558bd4d5db-gx469 -- sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "4f053952703f78b51bdf38a26ed391d8c2bda4138b87f35170d3fc4ea14fc510": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown
Not including a shell in your base image is a best practice, and projects like distroless make it super easy to package your applications with a small shell-less footprint. But when apps go rogue, what options do we have to debug them if the container doesn’t include a shell?
If you have shell access to the Kubernetes node the pod is running on, nsenter and the binaries on that host are a great way to debug problems. But what if you don’t have access to the node? Like in some managed Kubernetes services? In this case ephemeral containers and $(kubectl debug) may be a good option for you.
Ephemeral container support went into beta in 1.23, and is now enabled by default with super recent Kubernetes releases. This nifty feature allows you to spin up a container of your choosing alongside an existing container. Here is an example that creates an Ubuntu container, and attaches it (by placing it in the coredns PIDs namespaces) to a shell-less coredns pod:
$ kubectl debug -n kube-system -it coredns-64897985d-tn4tb --target=coredns --image=ubuntu
Targeting container "coredns". If you don't see processes from this container
it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-vx6mk.
If you don't see a command prompt, try pressing enter.
root@coredns-64897985d-tn4tb:/# ps auxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.2 0.5 750840 43488 ? Ssl 14:29 0:01 /coredns -conf /etc/coredns/Corefile
root 22 0.3 0.0 4248 3380 pts/0 Ss 14:39 0:00 bash
root 37 0.0 0.0 5900 2916 pts/0 R+ 14:39 0:00 ps auxwww
root@coredns-64897985d-tn4tb:/# dlv attach 1
Once you are in the debug container, you can install software, load up debuggers, etc. to get to the bottom of your issue. This is especially handy when you remove a problematic pod from a service so it no longer receives traffic. This allows you to debug in isolation, and without the time constraints that are usually associated with broken applications. Super cool feature!