Over the past two months I’ve been preparing to take the Kubernetes Administrator certification. As part of my prep work I’ve been reading a lot of code and breaking things in my HA cluster to see how they break and what what is required to fix them. I’ve also been automating every part of the cluster build process to get more familiar with ansible and the cluster bootstrap process.
Another area I’ve spent a tremendous amount of time on it Kubernetes networking. The Kubernetes network architecture was incredibly confusing when I first got into K8s so a lot of my time has been spent studying how layer-3 routing and overlay networking work under the covers. In the world of Kubernetes every pod is assigned an IP address and Kubernetes assumes pods are able to talk to other PODs via these IPs. The Kubernetes cluster networking document describes the reason behind this:
Kubernetes assumes that pods can communicate with other pods, regardless of which host they land on. We give every pod its own IP address so you do not need to explicitly create links between pods and you almost never need to deal with mapping container ports to host ports. This creates a clean, backwards-compatible model where pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
There are several solutions available to help with the pod-to-pod network connectivity requirement. I’ve done a good deal of work with flannel and weave and they both work remarkably well! I’ve also implemented a flat layer-3 network solution using host routes. Lorenzo Nicora provided a way to create and apply these routes with ansible via the kubernetes-routing.yaml playbook. If you want to see the routes that will be generated you can run the kubectl get nodes command listed at the top of the playbook:
$ kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'
192.168.2.44 10.1.4.0/24
192.168.2.45 10.1.0.0/24
192.168.2.46 10.1.2.0/24
192.168.2.47 10.1.3.0/24
192.168.2.48 10.1.1.0/24
This command will return the list of nodes as a JSON object and iterate over the elements to get the address and POD CIDR assigned to each worker. The kubernetes-routing playbook takes this concept and creates a number of tasks to extract this information, create routes and apply them to the workers. When I was first experimenting with this playbook I bumped into the following error:
TASK [kubernetes-workers : Get a list of IP addresses] ***********************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TemplateRuntimeError: no test named 'equalto'
fatal: [kubworker1.homefetch.net]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TemplateRuntimeError: no test named 'equalto'
The error came from the following task:
- name: Get a list of IP addresses
set_fact:
kubernetes_nodes_addresses: "{{ node_addresses_tmp.results|map(attribute='item')|selectattr('type','equalto','InternalIP')|map(attribute='address')|list }}"
After a bit of searching I came across a Jinja2 commit related to the equalto test. This feature was introduced in Jinja 2.8 and unfortunately I had an older version installed (the documentation notes this requirement so this was my own fault). After upgrading with pip:
$ pip install --upgrade Jinja2
Collecting Jinja2
Downloading Jinja2-2.10-py2.py3-none-any.whl (126kB)
100% |████████████████████████████████| 133kB 2.1MB/s
Collecting MarkupSafe>=0.23 (from Jinja2)
Downloading MarkupSafe-1.0.tar.gz
Installing collected packages: MarkupSafe, Jinja2
Found existing installation: MarkupSafe 0.11
Uninstalling MarkupSafe-0.11:
Successfully uninstalled MarkupSafe-0.11
Running setup.py install for MarkupSafe ... done
Found existing installation: Jinja2 2.7.2
Uninstalling Jinja2-2.7.2:
Successfully uninstalled Jinja2-2.7.2
Successfully installed Jinja2-2.10 MarkupSafe-1.0
The playbook ran without issue and my workers had routes! Lorenzo did a great job with this playbook and I like his use of with_flattened (this was new to me) and map to generate the list of node addresses. While this solution isn’t suitable for production it’s a great way to get an HA test cluster up and operational.
This week I needed to write a couple of scripts to pattern match some strings in a text file. My typical go to for these types of problems is python, but I wanted to learn something new so I started poking around bash(1) to see if there was a way to do this in a shell script. After a bit of reading, I came across the =~ operator which provides a way to pattern match in the shell:
An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accord‐ ingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise.
To test this I created the following bash function:
findMatch() {
if [[ "${1}" =~ "is a" ]]; then
echo "The string \"is a\" was found in string ${1}"
else
echo "The string \"is a\" was not not found in string ${1}"
fi
}
Running findMatch with the argument “This is a string” turned up a match:
The string "is a" was found in string This is a string
Pattern matching is case sensitive by default so it won’t match the string “This Is A string”:
The string "is a" was not not found in string This Is A string
If you need your matches to be be case insensitive you can set nocasematch with shopt:
shopt -s nocasematch
If we call findMatch with the same option as above we will now get a match:
The string "is a" was found in string This Is A string
This is a super useful feature to add to the shell scripting utility belt!
This evening while building out a new cluster I came across another fun kubelet error:
Jan 17 20:15:53 kubworker5.prefetch.net kube-proxy.v1.9.0[23071]: E0117 20:15:53.807410 23071 proxier.go:1701] Failed to delete stale service IP 10.2.0.10 connections, error: error deleting connection tracking state for UDP service IP: 10.2.0.10, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
The message was relatively straight forward. My kubelet daemon couldn’t find the conntrack executable to remove a service. If you aren’t familiar with conntrack(8) the manual page has a solid description:
conntrack provides a full featured userspace interface to the netfilter connection tracking system that is intended to replace the old /proc/net/ip_conntrack interface. This tool can be used to search, list, inspect and maintain the connec‐ tion tracking subsystem of the Linux kernel. Using conntrack , you can dump a list of all (or a filtered selection of) currently tracked connections, delete connections from the state table, and even add new ones.
What perplexed me about this was the use of exec() to interface with conntrack. I had been under the assumption that Kubernetes used the native APIs exposed to userland through the netfilter conntrack shared library. After 20-minutes of reading code I came across the ClearUDPConntrackForIP in conntrack.go which cleared that up:
err := ExecConntrackTool(execer, parameters...)
Installing the conntrack executable on my workers cleared up the issue and my service was removed. I’m learning the only way to truly learn Kubernetes is by reading code. And there’s a LOT of code. :)
I’ve been spending a good amount of my spare time trying to learn the ins and outs of kubernetes and terraform. To really get the gist of how Kubernetes works under the covers I’ve been automating Kubernetes the hard way with terraform and ansible. There are a a couple of dependencies in the Kubernetes world. One dependency is the control plane’s reliance on etcd. After configuring and starting my etcd cluster I wanted to check the cluster health before moving forward. You can retrieve the health status of an etcd node with the endpoint health option:
$ etcdctl endpoint health
http://127.0.0.1:2379 is healthy: successfully committed proposal: took = 651.381µs
Ansible provides a really cool feature to assist with these situations: the do-until loop. The do-until loop allows you to run a command a fixed number of times (the retries parameter contains the #) and continue once the until criteria is met. In my case I had ansible check for ‘is healthy’ in the stdout:
---
- hosts: kubcontrollers
tasks:
- shell: etcdctl --endpoints=[http://127.0.0.1:2379] endpoint health
register: result
until: result.stdout.find("is healthy") != -1
retries: 5
delay: 10
I’ve read through a few playbooks that use this to accomplish rolling restarts and upgrades. Nifty feature!
This past weekend while bootstrapping a new kubernetes cluster my kubeletes started logging the following error to the systemd journal:
Dec 30 10:26:10 kubworker1.prefetch.net kubelet[1202]: E1230 10:26:10.862904 1202 kubelet_node_status.go:106] Unable to register node "kubworker1.prefetch.net" with API server: nodes "kubworker1.prefetch.net" is forbidden: node "kubworker1" cannot modify node "kubworker1.prefetch.net"
Secure kubernetes configurations use client certificates along with the nodename to register with the control plane. My kubeconfig configuration file contained a short name:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: STUFF
server: https://apivip:443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: system:node:kubworker1
name: default
current-context: default
kind: Config
preferences: {}
users:
- name: system:node:kubworker1
user:
as-user-extra: {}
client-certificate-data: STUFF
client-key-data: STUFF
But the hostname assigned to the machine was fully qualified:
$ uname -n
kubworker1.prefetch.net
After re-reading the documentation there are two ways to address this. You can re-generate your certificates with the FQDN of your hosts or override the name with the kubelet ‘–hostname-override=NAME’ command line option. Passing the short name to the kubelet ‘–hostname-override’ option provided a quick fix and allowed my host to register:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubworker1 Ready <none> 13m v1.9.0
I need to do some additional digging to see what the best practices are for kubernetes node naming. That will go on my growing list of kubernetes questions to get answered.