Over the past few months I’ve been trying to learn everything there is to know about Kubernetes. Kubernetes is an amazing technology for deploying and scaling containers though it comes with a cost. It’s an incredibly complex piece of software and there are a ton of bells and whistles to become familiar with. One way that I’ve found for coming up to speed is Joe Beda’s weekly TGIK live broadcast. This occurs each Friday at 4PM EST and is CHOCK full of fantastic information. In episode seventy-nine Joe discusses ytt and kapp. You can watch it here:
Here are some of my takeways from the episode:
kapp deploy -a foo -f manifestdir/kubectl get deploy --show-labels fookapp inspect -a app --treekapp deploy -c -a app -f yamlkapp logs -f -a app#@ def labels():
foo: bar
appname: zebra
#@ end
spec:
  selector:
    matchLabels: #@ labels()
  template:
    metadata:
      labels: #@ labels()
As a long time Kubernetes user the question I hear most often is “how do I create manifests (the file that describes how to create and manage resources in a cluster)?” When I ask the person posing the question how they are creating resources today, I frequently hear that they cobbled together a bunch of random manifests they found on the ‘net or are using $(kubectl apply -f http://site/manifest) based on a website suggestion.
Learning how to generate manifests from scratch baffled me when I was first getting started with Kubernetes. I couldn’t find a comprehensive guide showing how to create resources from scratch, and the information needed to become proficient with this process was scattered across various sites. To assist folks who are just entering the K8S space I thought I would document the process I use to approach the “how do I create a manifest from scratch?” question.
So let’s begin with the basics. A Kubernetes manifest describes the resources (e.g., Deployments, Services, Pods, etc.) you want to create, and how you want those resources to run inside a cluster. I will describe how to learn more about each resource type later in this post. When you define a resource in a manifest it will contain the following four fields:
apiVersion: apps/v1
kind: Deployment
metadata:
  ...
spec:
  ...
The apiVersion: field specifies the API group you want to use to create the resource and the version of the API to use. Kubernetes APIs are aggregated into API groups which allows the API server to group APIs by purpose. If we dissect the apiVersion line “apps” would be the API group and v1 would be the version of the apps API to use. To list the available API groups and their versions you can run kubectl with the “api-versions” option:
$ kubectl api-versions |more
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
...
The second line, “kind:", lists the type of resource you want to create. Deployments, ReplicaSets, CronJobs, StatefulSet, etc. are examples of resources you can create. You can use the kubectl “api-resources” command to view the available resource types as well as the API group they are associated with:
$ kubectl api-resources |more
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
daemonsets                        ds           apps                           true         DaemonSet
deployments                       deploy       apps                           true         Deployment
replicasets                       rs           apps                           true         ReplicaSet
statefulsets                      sts          apps                           true         StatefulSet
...
With the “api-versions” and “api-resources” commands we can find out the available resources (KIND column), the API group (APIGROUP column) the resource type is associated with, and the API group versions (output from api-versions). This information can be used to fill in the apiVersion: and kind: fields. To understand the purpose of each resource type you can use the kubectl “explain” command:
$ kubectl explain --api-version=apps/v1 replicaset
KIND:     ReplicaSet
VERSION:  apps/v1
DESCRIPTION:
     ReplicaSet ensures that a specified number of pod replicas are running at
     any given time. 
This will give you a detailed explanation of the resource passed as an argument as well as the fields you can populate. Nifty! Now that we’ve covered the first two fields we can move on to metadata: and spec:. The metadata: section is used to uniquely identify the resource inside a Kubernetes cluster. This is were you name the resource, assign tags, annotations, specify a namespace, etc. To view the fields you can add to the metadata: section you can append the “.metadata” string to the resource type passed to “explain”:
$ kubectl explain deployment.metadata | more
KIND:     Deployment
VERSION:  extensions/v1beta1
RESOURCE: metadata <Object>
DESCRIPTION:
     Standard object metadata.
                                                                                                                                             
     ObjectMeta is metadata that all persisted resources must have, which
     includes all objects users must create.
FIELDS:
   annotations  <map[string]string>
     Annotations is an unstructured key value map stored with a resource that
     may be set by external tools to store and retrieve arbitrary metadata. They
     are not queryable and should be preserved when modifying objects. More
     info: http://kubernetes.io/docs/user-guide/annotations
...
Now that we’ve covered the first 3 fields let’s dig into what makes a manifest ticket. The spec: section! This section describes how to create and manage a resource. You will define the container image to use, the number of replicas in a ReplicaSet, the selector criteria, liveness and readiness probe definitions, etc. here To view the fields you can add to the spec: section you can append the “.spec” string to the resource type passed to explain:
$ kubectl explain deployment.spec | more
KIND:     Deployment
VERSION:  extensions/v1beta1
RESOURCE: spec <Object>
DESCRIPTION:
     Specification of the desired behavior of the Deployment.
     DeploymentSpec is the specification of the desired behavior of the
     Deployment.
FIELDS:
   minReadySeconds      <integer>
     Minimum number of seconds for which a newly created pod should be ready
     without any of its container crashing, for it to be considered available.
     Defaults to 0 (pod will be considered available as soon as it is ready)
...
Kubectl explain does a really nice job of showing the values under each section, but stitching these together by hand takes time and a lot of patience. To make this process easier the kubectl developers provided the “-o yaml” and “–dry-run” options. These options can be combined with the run and create commands to generate a basic manifest for the resource passed as an argument:
$ kubectl create deployment nginx --image=nginx -o yaml --dry-run
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        resources: {}
status: {}
Once you have a basic manifest to work with you can start extending it by adding additional fields to the spec: and metadata: sections. You can also add the “–recursive” option to kubectl explain to get a hierarchical view of the various fields. The following example shows how to recursively show every option you can use to customize the containers field:
$ kubectl explain deployment.spec.template.spec.containers --recursive | more
KIND:     Deployment
VERSION:  extensions/v1beta1
RESOURCE: containers <[]Object>
DESCRIPTION:
     List of containers belonging to the pod. Containers cannot currently be
     added or removed. There must be at least one container in a Pod. Cannot be
     updated.
     A single application container that you want to run within a pod.
FIELDS:
   args <[]string>
   command      <[]string>
   env  <[]Object>
      name      <string>
      value     <string>
      valueFrom <Object>
         configMapKeyRef        <Object>
            key <string>
            name        <string>
            optional    <boolean>
...
If you want to learn more about a specific field you can pass it to explain to get more information:
$ kubectl explain deployment.spec.selector.matchExpressions.operator
KIND:     Deployment
VERSION:  extensions/v1beta1
FIELD:    operator <string>
DESCRIPTION:
     operator represents a key's relationship to a set of values. Valid
     operators are In, NotIn, Exists and DoesNotExist.
I hope this brief explanation of how to get started with Kubernetes manifests is helpful. This post is definitely a work in progress and I plan to add to it as questions come in. If you have any questions or comments please hit me up on twitter.
Huge thanks to Duffie Cooley and Joe Beda for sharing $(kubectl explain –recursive) on TGIK. Awesome tip! And if you want to start learning more about Kubernetes from several experts in the community please tune in each Friday to TGIK (Thanks God It’s Kubernetes). Each episode dives deep into how various Kubernetes technologies work. You will be picking up nifty little tips and tricks left and right and wondering why you didn’t start watching it sooner.
References:
This morning I wanted to better understand how requests to ClusterIPs get routed to Kubernetes pods. Properly functioning networking is critical to Kubernetes and having a solid understanding of what happens under the covers makes debugging problems much, much easier. To get started with my studies I fired up five kuard pods:
$ kubectl create -f kuard.yaml
replicaset "kuard" created
$ kubectl get pods -o wide
NAME          READY     STATUS    RESTARTS   AGE       IP         NODE
kuard-8xwx7   1/1       Running   0          36s       10.1.4.3   kubworker4.prefetch.net
kuard-bd4cj   1/1       Running   0          36s       10.1.1.3   kubworker2.prefetch.net
kuard-hfkgd   1/1       Running   0          36s       10.1.2.4   kubworker5.prefetch.net
kuard-j9fks   1/1       Running   0          36s       10.1.0.3   kubworker3.prefetch.net
kuard-lpzlr   1/1       Running   0          36s       10.1.3.3   kubworker1.prefetch.net
I created 5 pods so one would hopefully be placed on each worker node. Once the pods finished creating I exposed the pods to the cluster with the kubectl expose command:
$ kubectl expose rs kuard --port=8080 --target-port=8080
$ kubectl get svc -o wide kuard
NAME      TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE       SELECTOR
kuard     ClusterIP   10.2.21.155   <none>        8080/TCP   20s       run=kuard
Behind the scenes kube-proxy uses iptables-save and iptables-restore to add rules. Here is the first rule that applies to the kuard service I exposed above:
-A KUBE-SERVICES -d 10.2.21.155/32 -p tcp -m comment --comment "default/kuard: cluster IP" -m tcp --dport 8080 -j KUBE-SVC-CUXC5A3HHHVSSN62
This rule checks if the destination (argument to “-d”) matches the cluster IP, the destination port (argument to –dport) is 8080 and the protocol (argument to “-p”) is tcp. If that check passes the rule will jump to the KUBE-SVC-CUXC5A3HHHVSSN62 target. Here are the rules in the KUBE-SVC-CUXC5A3HHHVSSN62 chain:
-A KUBE-SVC-CUXC5A3HHHVSSN62 -m comment --comment "default/kuard:" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-CA6TP3H7ZVLC3JFW
-A KUBE-SVC-CUXC5A3HHHVSSN62 -m comment --comment "default/kuard:" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-ZHHZWPGVXXVHUF5F
-A KUBE-SVC-CUXC5A3HHHVSSN62 -m comment --comment "default/kuard:" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-H2VR42IC623XBWYH
-A KUBE-SVC-CUXC5A3HHHVSSN62 -m comment --comment "default/kuard:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-AXZRC2VTEV7ZDZ2C
-A KUBE-SVC-CUXC5A3HHHVSSN62 -m comment --comment "default/kuard:" -j KUBE-SEP-5NFQVOMYN3PVBXGK
This chain contains one rule per pod. Each pod is assigned a probablility and the iptables-extension statistic module is used to pick the node that best matches. Once a node is selected iptables will jump to the target passed to “-j”. Here are the chains it will jump to:
-A KUBE-SEP-5NFQVOMYN3PVBXGK -s 10.1.4.3/32 -m comment --comment "default/kuard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-5NFQVOMYN3PVBXGK -p tcp -m comment --comment "default/kuard:" -m tcp -j DNAT --to-destination 10.1.4.3:8080
-A KUBE-SEP-AXZRC2VTEV7ZDZ2C -s 10.1.3.3/32 -m comment --comment "default/kuard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-AXZRC2VTEV7ZDZ2C -p tcp -m comment --comment "default/kuard:" -m tcp -j DNAT --to-destination 10.1.3.3:8080
-A KUBE-SEP-CA6TP3H7ZVLC3JFW -s 10.1.0.3/32 -m comment --comment "default/kuard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-CA6TP3H7ZVLC3JFW -p tcp -m comment --comment "default/kuard:" -m tcp -j DNAT --to-destination 10.1.0.3:8080
-A KUBE-SEP-H2VR42IC623XBWYH -s 10.1.2.4/32 -m comment --comment "default/kuard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-H2VR42IC623XBWYH -p tcp -m comment --comment "default/kuard:" -m tcp -j DNAT --to-destination 10.1.2.4:8080
-A KUBE-SEP-ZHHZWPGVXXVHUF5F -s 10.1.1.3/32 -m comment --comment "default/kuard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZHHZWPGVXXVHUF5F -p tcp -m comment --comment "default/kuard:" -m tcp -j DNAT --to-destination 10.1.1.3:8080
Now here’s where the magic occurs! Once a chain is picked the service IP will be NAT’ed to the destination node’s pod IP via the “–to-destination” option. Traffic will then traverse the hosts public network interface and arrive at the destination where it can be funneled to the pod (it’s pretty amazing and scary how this works behind the scenes). If I curl the service IP on port 8080:
$ curl 10.2.21.155:8080 > /dev/null
We can see the initial SYN and the translated destination (the IP of the pod to send the request to) with tcpdump:
$ tcpdump -n -i ens192 port 8080 and 'tcp[tcpflags] == tcp-syn'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
13:02:09.129502 IP 192.168.2.44.48102 > 10.1.2.4.webcache: Flags [S], seq 811928755, win 29200, options [mss 1460,sackOK,TS val 3048081500 ecr 0,nop,wscale 7], length 0
The rules utilize the connmark mark option to mark packets. I’m not 100% sure how this works (or the purpose) and will need to do some digging this weekend to see what the deal is. I learned a lot digging through packet captures and iptables and defintely have a MUCH better understanding of how pods and service IPs play with each other.
The Kubernetes ecosystem is rapidly evolving, which means tools, frameworks and ways to approach running Kubernetes in production are constantly changing. To keep up with these changes I used to use a custom script to provision test clusters with kubeadm. This worked well, but it took an excessive amount of time to spin up clusters to test features. When you are excited about testing a new tool or feature you want to do it NOW. You don’t want to wait ten minutes for the provisioning process to complete.
Well, lucky for me Kris Nova did a TGIK on KIND (Kubernetes In Docker). I was blown away by how quickly you can provision a test cluster on a modern laptop and have used it ever since. On my meager little MacBook Air I can provision a test cluster in 20-seconds, and can delete it in under 5-seconds. This is absolutely mind blowing!
This past weekend I was trying to develop a deep understanding of taints, tolerations and node affinities. To really dive deep into these features you need to have more than one worker available. By default, KIND will provision a cluster with a single worker node. This works great for the vast majority of test cases but comes up short when you are trying to test features that require multiple worker or control plane nodes. Luckily KIND is extremely extensible, and allows advanced topogies to be described in a YAML file similar to the following:
$ cat kind.yml
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
- role: worker
In the example above I am creating one control plane node and two workers. If you need more control or worker nodes you can add additional lines to the YAML file. To tell KIND to use this configuration you can use the “–config” option:
$ kind create cluster --name=foo --config=./kind.yml
Once KIND creates the cluster you can export the KUBECONFIG variable and begin testing:
$ export KUBECONFIG="$(kind get kubeconfig-path --name="foo")"
$ kubectl get nodes
NAME                STATUS     ROLES    AGE   VERSION
foo-control-plane   NotReady   master   43s   v1.15.0
foo-worker          NotReady   <none>   6s    v1.15.0
foo-worker2         NotReady   <none>   6s    v1.15.0
This is amazing and KIND is one of the coolest tools to hit the K8S community. Digging it!
If you work with with modern orchestration and configuration management systems you are most likely dealing with YAML and JSON on a daily basis. During testing, it is periodically useful to convert between these two formats, especially when interacting directly with API gateways. The yq Go program makes this incredibly easy and it has become a staple in my utility belt! To get started with yq you can snag it from github:
$ curl -L --output yq https://github.com/mikefarah/yq/releases/download/2.4.0/yq_linux_amd64
$ chmod 700 ./yq
Once downloaded you can review the available options by running yq without any arguments:
$ ./yq
Usage:
  yq [flags]
  yq [command]
Available Commands:
  delete      yq d [--inplace/-i] [--doc/-d index] sample.yaml a.b.c
  help        Help about any command
  merge       yq m [--inplace/-i] [--doc/-d index] [--overwrite/-x] [--append/-a] sample.yaml sample2.yaml
  new         yq n [--script/-s script_file] a.b.c newValue
  prefix      yq p [--inplace/-i] [--doc/-d index] sample.yaml a.b.c
  read        yq r [--doc/-d index] sample.yaml a.b.c
  write       yq w [--inplace/-i] [--script/-s script_file] [--doc/-d index] sample.yaml a.b.c newValue
Flags:
  -h, --help      help for yq
  -t, --trim      trim yaml output (default true)
  -v, --verbose   verbose mode
  -V, --version   Print version information and quit
Use "yq [command] --help" for more information about a command.
The yq project was started to provide a tool that worked like jq but operates on YAML instead of JSON. One feature I find myself using frequently is the read commands “–tojson” option:
$ ./yq r -j f.yml
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"creationTimestamp":null,"labels":{"run":"nginx"},"name":"nginx"},"spec":{"replicas":1,"selector":{"matchLabels":{"run":"nginx"}},"strategy":{},"template":{"metadata":{"creationTimestamp":null,"labels":{"run":"nginx"}},"spec":{"containers":[{"image":"nginx","name":"nginx","resources":{}}]}}},"status":{}}
This will take a YAML file, process it and spit out a JSON object. Extremely handy for crafting the JSON objects you pass to the “–data” argument in a curl POST request:
$ curl --header "Content-Type: application/json" --request POST --data <yq output goes here> http://api/path/to/rest/api
In a follow up post I will show how you can use this amazing utility to slice and dice yaml. Amazing tool!