Last month I started a course that teaches you how to write your own Operating System. Working at the intersection of hardware and software (X86 Assembly and C) has been incredibly rewarding. I’ve learned a TON! One interesting thing I came across in the Linux kernel’s bootloader code is the use of “asm volatile”. Here is a snippet from $SRC_DIR/linux-5.16/arch/x86/boot/boot.h:
#define cpu_relax() asm volatile("rep; nop")
The history behind this is super interesting, and it’s used to force the compiler’s optimizer to execute the code AS IS. Being somewhat curious, I wrote a simple C program to see this in action:
#define cpu_relax() asm volatile("rep; nop")
int main(int argc, char **argv) {
cpu_relax();
}
When I compiled it, I was a bit surprised that the instructions above didn’t show up verbatim in the objdump output, but they did when the binary was compiled with the gcc create ASM option:
$ gcc -o test test.c
$ objdump -d -j .text test
0000000000001129 <main>:
1129: f3 0f 1e fa endbr64
112d: 55 push %rbp
112e: 48 89 e5 mov %rsp,%rbp
1131: 89 7d fc mov %edi,-0x4(%rbp)
1134: 48 89 75 f0 mov %rsi,-0x10(%rbp)
1138: f3 90 pause
113a: b8 00 00 00 00 mov $0x0,%eax
113f: 5d pop %rbp
1140: c3 retq
1141: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1148: 00 00 00
114b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
$ gcc -S test.c
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
#APP
# 4 "hi.c" 1
# rep; nop
This one had me stumped, but luckily a super awesome friend of mine had a great theory. Objdump was most likely using a synthetic instruction (pause) to replace the rep and nop. Since objdump is interpreting the ELF binary, and gcc is creating assembly from source code, this totally makes sense. One of these days I need to study how gcc and company optimize code. This is a fascinating topic!
Container images are one of the items that makes up a “container.” In most cases container images use a base image (e.g., Alpine, Ubuntu, etc.), and then one or more application-specific layers are added on top of that. There are numerous documented best practices for optimizing container images, and these best practices result in smaller images, less network traffic, and a reduction in container creation time.
Unfortunately in practice, I’ve seen numerous cases were these best practices weren’t followed. I’ve come across Dockerfiles that used dozens of RUN commands, didn’t take advantage of multistaged builds, didn’t optimize for image layer re-use, etc. When I’ve encountered these types of issues, I’ve always taken it upon myself to work with the author to refactor the build instructions, and to educate them on best practices. The hadolint project maintains an excellent set of Docker best practice rules, which I highly suggest reviewing if you haven’t already.
When situations pop up where I need to dig into a container image, I always turn to my good buddy dive. This amazing little utility allows you to analyze a container image in a console TUI. Exploring a container image with dive is super easy. Type dive into your terminal, pass the container image you want to explore as an argumnet, and the TUI will be displayed:
$ dive nginx:latest
One of the most useful screens in the TUI is the image details pane:
Image name: nginx
Total Image size: 142 MB
Potential wasted space: 3.8 MB
Image efficiency score: 98 %
This shows the image size, how much space is wasted, and the containers efficiency score. The efficiency score is super helpful for understanding where the image lies on the efficiency spectrum, and if further analysis would be beneficial. Dive was developed with CI in mind. You can run dive with the “–highestUserWastedPercent” and “–highestWastedBytes” arguments as part of a CI pipeline. If the image that is created doesn’t past muster, you can fail the build until someone reviews the build instructions.
If you come across an image with a poor efficiency score while debugging an issue, you can review the container’s build statements (Dockerfiles, JIB XML stanzas, etc.) in source control, or generate these on the fly with the docker history command:
$ docker history --no-trunc nginx:latest
I love debugging problems, especially ones where the end result is added efficiency. Giddie up!
One of my friends recently reached out with a fun problem. His monitoring system was periodically not firing when file systems grew past the thresholds he defined. When we hopped on one of his EC2 instances to debug the issue, I noticed that we were getting a permission denied (EACCES) errno when running df as their monitoring user:
$ df -h /vault/data
df: ‘/vault/data’: Permission denied
When we ran the same command as trusty UID 0, everything worked as expected:
$ sudo df -h /vault/data
Filesystem Size Used Avail Use% Mounted on
/dev/nvme1n1 20G 1G 20G 1% /vault/data
A quick check with strace verified this as well:
$ strace -e trace=statfs df -h 2>&1 | grep vault
statfs("/vault/data", 0x7ffe00af8aa0) = -1 EACCES (Permission denied)
If you aren’t familiar with statfs(2), it returns information about a mounted file system in a statfs structure. Here is a blurb from the manual page describing which information is returned:
The function statfs() returns information about a mounted file system. path is the pathname of any file within the mounted file system. buf is a pointer to a statfs structure defined approximately as follows:
struct statfs {
__SWORD_TYPE f_type; /* type of file system (see below) */
__SWORD_TYPE f_bsize; /* optimal transfer block size */
fsblkcnt_t f_blocks; /* total data blocks in file system */
fsblkcnt_t f_bfree; /* free blocks in fs */
fsblkcnt_t f_bavail; /* free blocks available to
unprivileged user */
fsfilcnt_t f_files; /* total file nodes in file system */
fsfilcnt_t f_ffree; /* free file nodes in fs */
fsid_t f_fsid; /* file system id */
__SWORD_TYPE f_namelen; /* maximum length of filenames */
__SWORD_TYPE f_frsize; /* fragment size (since Linux 2.6) */
__SWORD_TYPE f_spare[5];
};
I thought that df was setuid root like the mount uility, so when I initially saw the permission denied error I thought it was something unrelated to permissions. But low and behold it was indeed due to df not being setuid root:
$ ls -la /usr/bin/df
-rwxr-xr-x 1 root root 100856 Jan 23 2020 /usr/bin/df
So when df tried to statfs() this file system as an unprivileged user, it got the permission denied error. We found a simple wrokaround to get things working, and I learned something new in the process. Neato!
If you’ve worked with Kubernetes for any length of time, you are probably intimately familiar with deployment manifests. If this concept is new to you, deployment manifests are used to add resources to a cluster in a declarative manor. Some of the larger projects (cert-manager, Istio, CNI plug-ins, etc.) in the Kubernetes ecosystem provide manifests to deploy the resources that make their application work. These can often be 1000s of lines, and if you are security conscious you don’t want to deploy anything to a cluster without validating what it is.
The K14S project took this issue to heart when they released the kapp utility. This super useful utility can help you see the changes that would take place to a cluster, but without actually making any changes. To show how useful this is, lets say you wanted to see which resource Istio would deploy. You can see this with kapp deploy:
$ kapp deploy -a istio -f <(kustomize build)
Target cluster 'https://127.0.0.1:33783' (nodes: test-control-plane, 3+)
Changes
Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
(cluster) istio-operator ClusterRole - - create - reconcile - -
^ istio-operator ClusterRoleBinding - - create - reconcile - -
^ istio-operator Namespace - - create - reconcile - -
^ istio-system Namespace - - create - reconcile - -
^ istiooperators.install.istio.io CustomResourceDefinition - - create - reconcile - -
istio-operator istio-operator Deployment - - create - reconcile - -
^ istio-operator ServiceAccount - - create - reconcile - -
^ istio-operator-metrics Service - - create - reconcile - -
Op: 8 create, 0 delete, 0 update, 0 noop
Wait to: 8 reconcile, 0 delete, 0 noop
Continue? [yN]: N
The output contains the resource type and the operation that will take place. In the example above we are going to create 8 resources, and assign the application name “istio” (a label) to each resource. Kapp deploy can also be fed the “–diff-changes” option to display a diff between the manifests and the current cluster state, “–allow-ns” to specify the namespaces that the app has to go into, and the “–into-ns” to map the namespaces in the manifests to one of your choosing. Kapp will assign a label to the resources it deploys, which is used by “list” to show resources that are managed by kapp:
$ kapp list
Target cluster 'https://127.0.0.1:33783' (nodes: test-control-plane, 3+)
Apps in namespace 'default'
Name Namespaces Lcs Lca
istio (cluster),istio-operator true 4d
nginx - - -
Lcs: Last Change Successful
Lca: Last Change Age
2 apps
Succeeded
Another super useful feature of kapp is its ability to inspect an application that was previously deployed:
$ kapp inspect -a istio --tree
Target cluster 'https://127.0.0.1:33783' (nodes: test-control-plane, 3+)
Resources in app 'istio'
Namespace Name Kind Owner Conds. Rs Ri Age
(cluster) istio-operator ClusterRole kapp - ok - 4d
istio-operator istio-operator ServiceAccount kapp - ok - 4d
(cluster) istiooperators.install.istio.io CustomResourceDefinition kapp 2/2 t ok - 4d
istio-operator istio-operator-metrics Service kapp - ok - 4d
istio-operator L istio-operator-metrics Endpoints cluster - ok - 4d
(cluster) istio-operator ClusterRoleBinding kapp - ok - 4d
(cluster) istio-system Namespace kapp - ok - 4d
(cluster) istio-operator Namespace kapp - ok - 4d
istio-operator istio-operator Deployment kapp 2/2 t ok - 4d
istio-operator L istio-operator-77d57c5c57 ReplicaSet cluster - ok - 4d
istio-operator L.. istio-operator-77d57c5c57-dkl8b Pod cluster 4/4 t ok - 4d
Rs: Reconcile state
Ri: Reconcile information
11 resources
Succeeded
In the output above you can see the resource relationships in tree form, the object type, the owner, and the state of the resource. This is a crazy useful utility, and one I’ve started to use almost daily. It’s super useful for observing the state of a cluster, and for debugging problems. Thanks K14S for this amazing piece of software!
This past week I got to spend some time upgrading my CI/CD systems. The Gitlab upgrade process requires stepping to a specific version when you upgrade major versions, which can be a problem if the latest version isn’t supported by the upgrade scripts . In these types of situations, you can tell yum to upgrade to a specific version. To list the versions of a package that are available, you can use the search commands “–showduplicates” option:
$ yum search --showduplicates gitlab-ee | grep 13.0
gitlab-ee-13.0.0-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.1-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.3-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.4-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.5-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.6-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.7-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.8-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.9-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.10-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
gitlab-ee-13.0.12-ee.0.el7.x86_64 : GitLab Enterprise Edition (including NGINX,
Once you eye the version you want, you can pass it to yum install:
$ yum install gitlab-ee-13.0.12-ee.0.el7.x86_64
This can also be useful if you want to stick to a minor version vs. upgrading to a new major release.