Using docker volumes on SELinux-enabled servers

I was doing some testing this week and received the following error when I tried to access a volume inside a container:

$ touch /haproxy/i

touch: cannot touch 'i': Permission denied

When I checked the system logs I saw the following error:

Sep 28 18:40:23 kub1 audit[8881]: AVC avc: denied { write } for pid=8881 comm="touch" name="haproxy" dev="sda1" ino=655362 context=system_u:system_r:container_t:s0:c324,c837 tcontext=unconfined_u:object_r:default_t:s0 tclass=dir permissive=0

The docker container was started with the “-v” option to bind mount a directory from the host:

$ docker run -d -v /haproxy:/haproxy --restart unless-stopped

The error shown above was generated because I didn’t tell my orchestration tool to apply an SELinux label to the volume I was trying to map into the container. In the SELinux world processes and file system objects are given contexts to describe their purpose. These contexts are then used by the kernel to allow processes to access file objects if policy allows it. To allow a docker container to access a volume on a SELinux-enabled host you need to attach the “z” or “Z” flag to the volume mount. These flags are thoroughly described in the docker-run manual page:

“To change a label in the container context, you can add either of two suffixes :z or :Z to the volume mount. These suffixes tell Docker to relabel file objects on the shared volumes. The z option tells Docker that two containers share the volume content. As a result, Docker labels the content with a shared content label. Shared volume labels allow all containers to read/write content. The Z option tells Docker to label the content with a private unshared label. Only the current container can use a private volume.”

When I added the “Z” suffix to the volume everything worked as expected:

$ docker run -d -v /haproxy:/haproxy:Z --restart unless-stopped

My haproxy instances fired up, life was grand and the haproxy containers were distributing connections to my back-end servers. Then a question hit me. How does this work under the covers? I started reading and came across two (one & two) excellent posts by Dan Walsh. When a container starts the processes comprising that container will be labeled with an SELinux context. You can run ‘ps -eZ’ or ‘docker inspect …’ to view the context of a container:

$ docker run --name gorp --rm -it -v /foo:/foo fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp


$ ps -eZ | grep (docker inspect -f '{{ .State.Pid }}' gorp)

system_u:system_r:container_t:s0:c31,c878 20197 pts/5 00:00:00 sh

In order for the process to be able to write to a volume the volume needs to be labeled with a SELinux context that the process context has access to. This is the the purpose of the ‘[zZ]’ flags. If you start a container without the z flag you will receive a permission denied error because the SELinux volume level and the process level don’t match (you can read more about levels here). This may be easier to illustrate with an example. If I start a docker command and mount a volume without the “z” flag we can see that the SELinux levels are different:

$ docker run --name gorp --rm -it -v /foo:/foo fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp


$ ls -ladZ /foo

drwxr-xr-x. 2 root root
system_u:object_r:container_file_t:s0:c135,c579 4096 Sep 29
12:22 /foo

If we tell docker to label the volume with the correct SELinux context prior to performing the bind mount the levels are updated to allow the container process to access the volume. Here is another example:

$ docker run --name gorp --rm -it -v /foo:/foo:Z fedora:26 /bin/sh /bin/sh.distrib

$ docker inspect -f '{{ .ProcessLabel }}' gorp


$ ls -ladZ /foo

drwxr-xr-x. 2 root root
system_u:object_r:container_file_t:s0:c126,c135 4096 Sep 30 10:42 /foo

The contexts that apply to docker are defined in the lxc_contexts file:

$ cat /etc/selinux/targeted/contexts/lxc_contexts

process = "system_u:system_r:container_t:s0"
content = "system_u:object_r:virt_var_lib_t:s0"
file = "system_u:object_r:container_file_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_lxc_process = "system_u:system_r:container_t:s0"

It’s really interesting how these items are stitched together. You can read more about how this works here and here. You can also read Dan’s article describing why it’s important to leave SELinux enabled.

This article was posted by Matty on 2017-09-30 12:21:00 -0400 -0400