Using docker volumes on SELinux-enabled servers

I was doing some testing this week and received the following error when I tried to access a volume inside a container:

$ touch /haproxy/i
touch: cannot touch ‘i’: Permission denied

When I checked the system logs I saw the following error:

Sep 28 18:40:23 kub1 audit[8881]: AVC avc:  denied  { write } for pid=8881 comm="touch" name="haproxy" dev="sda1" ino=655362 context=system_u:system_r:container_t:s0:c324,c837 tcontext=unconfined_u:object_r:default_t:s0 tclass=dir permissive=0

The docker container was started with the “-v” option to bind mount a directory from the host:

$ docker run -d -v /haproxy:/haproxy –restart unless-stopped haproxy:1.7.9

The error shown above was generated because I didn’t tell my orchestration tool to apply an SELinux label to the volume I was trying to map into the container. In the SELinux world processes and file system objects are given contexts to describe their purpose. These contexts are then used by the kernel to allow processes to access file objects if policy allows it. To allow a docker container to access a volume on a SELinux-enabled host you need to attach the “z” or “Z” flag to the volume mount. These flags are thoroughly described in the docker-run manual page:

"To change a label in the container context, you can add either of two suffixes :z or :Z to the volume mount. These suffixes tell Docker to relabel file objects on the shared volumes. The z option tells Docker that two containers share the volume content. As a result, Docker labels the content with a shared content label. Shared volume labels allow all containers to read/write content.  The Z option tells Docker to label the content with a private unshared label. Only the current container can use a private volume."

When I added the “Z” suffix to the volume everything worked as expected:

$ docker run -d -v /haproxy:/haproxy:Z –restart unless-stopped haproxy:1.7.9

My haproxy instances fired up, life was grand and the haproxy containers were distributing connections to my back-end servers. Then a question hit me. How does this work under the covers? I started reading and came across two (one & two) excellent posts by Dan Walsh. When a container starts the processes comprising that container will be labeled with an SELinux context. You can run ‘ps -eZ’ or ‘docker inspect …’ to view the context of a container:

$ docker run –name gorp –rm -it -v /foo:/foo fedora:26 /bin/sh

$ docker inspect -f ‘{{ .ProcessLabel }}’ gorp
system_u:system_r:container_t:s0:c31,c878

$ ps -eZ | grep $(docker inspect -f ‘{{ .State.Pid }}’ gorp)
system_u:system_r:container_t:s0:c31,c878 20197 pts/5 00:00:00 sh

In order for the process to be able to write to a volume the volume needs to be labeled with a SELinux context that the process context has access to. This is the the purpose of the ‘[zZ]’ flags. If you start a container without the z flag you will receive a permission denied error because the SELinux volume level and the process level don’t match (you can read more about levels here). This may be easier to illustrate with an example. If I start a docker command and mount a volume without the “z” flag we can see that the SELinux levels are different:

$ docker run –name gorp –rm -it -v /foo:/foo fedora:26 /bin/sh

$ docker inspect -f ‘{{ .ProcessLabel }}’ gorp
system_u:system_r:container_t:s0:c21,c30

$ ls -ladZ /foo
drwxr-xr-x. 2 root root system_u:object_r:container_file_t:s0:c135,c579 4096 Sep 29 12:22 /foo

If we tell docker to label the volume with the correct SELinux context prior to performing the bind mount the levels are updated to allow the container process to access the volume. Here is another example:

$ docker run –name gorp –rm -it -v /foo:/foo:Z fedora:26 /bin/sh

$ docker inspect -f ‘{{ .ProcessLabel }}’ gorp
system_u:system_r:container_t:s0:c126,c135

$ ls -ladZ /foo
drwxr-xr-x. 2 root root system_u:object_r:container_file_t:s0:c126,c135 4096 Sep 30 10:42 /foo

The contexts that apply to docker are defined in the lxc_contexts file:

$ cat /etc/selinux/targeted/contexts/lxc_contexts
process = “system_u:system_r:container_t:s0”
content = “system_u:object_r:virt_var_lib_t:s0”
file = “system_u:object_r:container_file_t:s0″
ro_file=”system_u:object_r:container_ro_file_t:s0”
sandbox_kvm_process = “system_u:system_r:svirt_qemu_net_t:s0”
sandbox_kvm_process = “system_u:system_r:svirt_qemu_net_t:s0”
sandbox_lxc_process = “system_u:system_r:container_t:s0”

It’s really interesting how these items are stitched together. You can read more about how this works here and here. You can also read Dan’s article describing why it’s important to leave SELinux enabled.

The subtle differences between the docker ADD and COPY commands

This weekend I spent some time cleaning up a number of Dockerfiles and getting them integrated into my build system. Docker provides the ADD and COPY commands to take the contents from a given source and copy them into your container. On the surface both commands appear to do the same thing but there is one slight difference. The COPY command works solely on files and directories:

The ADD instruction copies new files, directories or remote file URLs from  and adds them to the file system of the image at the path <dest>.

While the ADD commands supports files, directories AND remote URLs:

The ADD instruction copies new files, directories or remote file URLs from  and adds them to the files ystem of the image at the path <dest>.

The additional feature provided by ADD allows you to retrieve remote resources and stash them in your container for use by your applications:

ADD http://prefetch.net/path/to/stuff /stuff

Some of the Dockerfiles I’ve read through on github have done some extremely interesting things with ADD and remote resource retrieval. Nothing this here for future reference.

One of the best docker resources on the interwebs

For the past two years I’ve scoured the official docker documentation when I needed to learn something. Their documentation is really good but there are areas that lack examples and a deep explanation of why something is the way it is. One of my goals for this year is to read one technical book / RFC a month so I decided to start off the year with James Turnbull’s The Docker Book. James starts with the basics and then extends this with a thorough description of images, testing with docker and orchestration. This is by far the best $10 I’ve spent on a book and I’m hoping to read his new Terraform book once I finish reading through my DNS RFC. Awesome job on the book James!

Making sense of docker storage drivers

Docker has a pluggable storage architecture which currently contains 6 drivers.

AUFS - Original docker storage driver.
OverlayFS Driver built on top of overlayfs.
Btrfs Driver built on top of brtfs.
Device Mapper Driver built on top of the device mapper.
ZFS Driver built on top of the ZFS file system.
VFS A VFS-layer driver that isn't considered suitable for production. 

If you have docker installed you can run ‘docker info’ to see which driver you are using:

$ docker info | grep “Storage Driver:”
Storage Driver: devicemapper

Picking the right driver isn’t straightforward due to how fast docker and the storage drivers are evolving. The docker documentation has some excellent suggestions and you can’t go wrong using the most widely used drivers. I have hit a couple of bugs with the overlayfs driver and I have never bothered with the devicemapper driver with loopback files (vs. the device mapper driver w/ direct LVM) because of Jason’s post.

My biggest storage lesson learned (i.e., I do this because I hit bugs) from the past year is to give docker a chunk of dedicated storage. This space can reside in your root volume group, a dedicated volume group or in a partition. To use a dedicated volume group you can add “VG=VOLUME_GROUP” to /etc/sysconfig/docker-storage-setup:

$ cat /etc/sysconfig/docker-storage-setup
VG=”docker”

To use a dedicate disk you can add “DEV=BLOCK_DEVICE” to /etc/sysconfig/docker-storage-setup:

$ cat /etc/sysconfig/docker-storage-setup
DEVS=”/dev/sdb”

If either of these variables are set docker-storage-setup will create an LVM thin pool which docker will use to layer images. This layering is the foundation that docker containers are built on top of.

If you change VG or DEVS and docker is operational you will need to backup up your images, clean up /var/lib/docker and then run docker-storage-setup to apply the changes. The following shows what happens if docker-storage-setup is run w/o any options set:

                       
$ docker-storage-setup
  Rounding up size to full physical extent 412.00 MiB
  Logical volume "docker-poolmeta" created.
  Logical volume "docker-pool" created.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted docker/docker-pool to thin pool.
  Logical volume docker/docker-pool changed.

This create the data and metadata volumes in the root volume group and updates the docker configuration. If anyone is using the brtfs or zfs storage drivers shoot me a note to let me know what your experience has been.

Using docker to build software testing environments

I’ve been on the docker train for quite some time. While the benefits of running production workloads in containers is well known, I find docker just as valuable for evaluating and testing new software on my laptop. I’ll use this blog post to walk through how I build transient test environments for software evaluation.

Docker is based around images (Fedora, CentOS, Ubuntu, etc.), and these images can be created and customized through the use of a Dockerfile. The Dockerfile contains statements to control the OS that is used, the software that is installed and post configuration. Here is a Dockerfile I like to use for building test environments:

$ cat Dockerfile

FROM centos:7
MAINTAINER Matty

RUN yum -y update
RUN yum -y install openssh-server openldap-servers openldap-clients openldap
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN echo 'root:XXXXXXXX' | chpasswd

RUN /usr/bin/ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key -C '' -N ''
RUN /usr/bin/ssh-keygen -t rsa -f /etc/ssh/ssh_host_dsa_key -C '' -N ''

EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

To create an image from this Dockerfile you can use docker build:

$ docker build -t centos:7 .

The “-t” option assigns a tag to the image which can be referenced when a new container is instantiated. To view the new image you can run docker images:

$ docker images centos
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
centos              7                   4f798f95cfe1        8 minutes ago       414.8 MB
docker.io/centos    6                   f07f6ca555a5        3 weeks ago         194.6 MB
docker.io/centos    7                   980e0e4c79ec        3 weeks ago         196.7 MB
docker.io/centos    latest              980e0e4c79ec        3 weeks ago         196.7 MB

Not to have some fun! To create a new container we can use docker run:

$ docker run -d -P -h foo --name foo --publish 2222:22 centos:7
f84477722896b2701506ee65a3f5a909199675a9cd591f3591e906a8795eba5c

This instantiates a new CentOS container with the name (–name) foo, the hostname (-h) foo and uses the centos:7 image I created earlier. It also maps (–publish) port 22 in the container to port 2222 on my local PC. To access the container you can fire up SSH and connect to port 2222 as root (this is a test container so /dev/null the hate mail):

$ ssh root@localhost -p 2222
root@localhost's password: 
[root@foo ~]# 

Now I can install software, configure it, break it and debug issues all in an isolated environment. Once I’m satisfied with my testing I can stop the container and delete it:

$ docker stop foo
foo

$ docker rm foo

I find that running an SSH daemon in my test containers is super valuable. For production I would take Jérôme’s advice and look into other methods for getting into your containers.