When I first started learning how the Kubernetes networking model works I wanted to configure everything manually to see how the pieces fit together. This was a great learning experience and was easy to automate with ansible. This solution has a couple of downsides. If a machine is rebooted it loses the PodCIDR routes since they aren’t persisted to disk. It also doesn’t add or remove routes for hosts as they are added and removed from the cluster. I wanted a more permanent and dynamic solution so I started looking at the flannel host-gw and vxlan backends.
The flannel host-gw option was the first solution I evaluated. This backend takes the PodCIDR addresses assigned to all of the nodes and creates routing table entries so the workers can reach each other through the cluster IP range. In addition, flanneld will NAT the cluster IPs to the host IP if a pod needs to contact a host outside of the local broadcast domain. The flannel daemon (flanneld) runs as a DaemonSet so one pod (and one flanneld daemon) will be created on each worker. Setting up the flannel host-gw is ridiculously easy. To begin, you will need to download the deployment manifest from GitHub:
$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once you retrieve the manifest you will need to change the Type in the net-conf.json YAML block from vxlan to host-gw. We can use our good buddy sed to make the change:
$ sed 's/vxlan/host-gw/' -i kube-flannel.yml
To apply the configuration to your cluster you can use the kubectl create command:
$ kubectl create -f kube-flannel.yml
This will create several Kubernetes objects:
To verify the pods were created and are currently running we can use the kubectl get command:
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-flannel-ds-42nwn 1/1 Running 0 5m 192.168.2.45 kubworker2.prefetch.net
kube-flannel-ds-49zvp 1/1 Running 0 5m 192.168.2.48 kubworker5.prefetch.net
kube-flannel-ds-t8g9f 1/1 Running 0 5m 192.168.2.44 kubworker1.prefetch.net
kube-flannel-ds-v6kdr 1/1 Running 0 5m 192.168.2.46 kubworker3.prefetch.net
kube-flannel-ds-xnlzc 1/1 Running 0 5m 192.168.2.47 kubworker4.prefetch.net
We can also use the kubectl logs command to review the flanneld logs that were produced when it was initialized:
$ kubectl logs -n kube-system kube-flannel-ds-t8g9f
I0220 14:31:23.347252 1 main.go:475] Determining IP address of default interface
I0220 14:31:23.347435 1 main.go:488] Using interface with name ens192 and address 192.168.2.44
I0220 14:31:23.347446 1 main.go:505] Defaulting external address to interface address (192.168.2.44)
I0220 14:31:23.357568 1 kube.go:131] Waiting 10m0s for node controller to sync
I0220 14:31:23.357622 1 kube.go:294] Starting kube subnet manager
I0220 14:31:24.357751 1 kube.go:138] Node controller sync successful
I0220 14:31:24.357771 1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - kubworker1.prefetch.net
I0220 14:31:24.357774 1 main.go:238] Installing signal handlers
I0220 14:31:24.357869 1 main.go:353] Found network config - Backend type: host-gw
I0220 14:31:24.357984 1 main.go:300] Wrote subnet file to /run/flannel/subnet.env
I0220 14:31:24.357988 1 main.go:304] Running backend.
I0220 14:31:24.358007 1 main.go:322] Waiting for all goroutines to exit
I0220 14:31:24.358044 1 route_network.go:53] Watching for new subnet leases
I0220 14:31:24.443807 1 route_network.go:85] Subnet added: 10.1.4.0/24 via 192.168.2.45
I0220 14:31:24.444040 1 route_network.go:85] Subnet added: 10.1.1.0/24 via 192.168.2.46
I0220 14:31:24.444798 1 route_network.go:85] Subnet added: 10.1.2.0/24 via 192.168.2.47
I0220 14:31:24.444883 1 route_network.go:85] Subnet added: 10.1.3.0/24 via 192.168.2.48
To verify the PodCIDR routes were created we can log into one of the workers and run ip route show:
$ ip route show
default via 192.168.2.254 dev ens192 proto static metric 100
10.1.1.0/24 via 192.168.2.46 dev ens192
10.1.2.0/24 via 192.168.2.47 dev ens192
10.1.3.0/24 via 192.168.2.48 dev ens192
10.1.4.0/24 via 192.168.2.45 dev ens192
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.2.0/24 dev ens192 proto kernel scope link src 192.168.2.44 metric 100
Sweet! My cluster IP range is 10.1.0.0/16 and the output above shows the routes this worker will take to reach cluster IPs on other workers. Now if you’re like me you may be wondering how does flannel create routes on the host when its running in a container? Here’s were the power of DaemonSets shine. Inside the deployment manifest flannel sets hostNetwork: to true:
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
This allows the pod to access the hosts network namespace. There are a couple of items you should be aware of. The flannel manifest I downloaded from GitHub uses a flannel image from the quay.io repository. I’m always nervous about using images I don’t generate from scratch (and validate w/ digital signatures) with automated build tools. Second, if we log into one of the flannel containers and run ps:
$ kubectl exec -i -t -n kube-system kube-flannel-ds-t8g9f ash
/ # ps auxwww
PID USER TIME COMMAND
1 root 0:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
1236 root 0:00 ash
1670 root 0:00 ash
1679 root 0:00 ps auxwww
You will notice that flanneld is started with the “–kube-subnet-mgr” option. This option tells flanneld to contact the API server to retrieve the subnet assignments. This will also cause flanneld to watch for network changes (host additions and removals) and adjust the host routes accordingly. In a follow up post I’ll dig into vxlan and some techniques I found useful for debugging node-to-node communications.