Using external-dns to managed DNS entries in Kubernetes clusters

This article was posted by on 2020-01-28 11:36:59 -0500 -0500

Kubernetes provides a service resource to distribute traffic across one or more pods. I won’t go into detail on what a service is, since it’s covered in-depth elsewhere. For Internet-facing applications, this Service will typically be of type LoadBalancer. If you are running in the “cloud,” creating a service of type LoadBalancer will trigger cloud provider specific logic to provision an external load balancer (either private or public) with the target being your service. Once the load balancer is provisioned, your cloud provider will return a long DNS name to represent the load balancer endpoint:

$ kubectl get svc

NAME            TYPE           CLUSTER-IP       EXTERNAL-IP                                                             PORT(S)          AGE
matty-service   LoadBalancer   1.2.3.4          XXXXXXXXXXXXXXXXXXXXXXXX-XXXXXXXXXX.us-east-1.elb.amazonaws.com         8081:32759/TCP   3m16s

If you are hosting services for customers, you don’t want them to use that long string to access your service. The DNS name also won’t work when you are using virtual name hosting or an Ingress that contains hostnames. What you want to is to hand your customer a vanity name (e.g., example.com) that maps to the long load balancer FQDN through DNS. You could manually go into your providers DNS service to create this mapping, or you can let external-dns take care of that for you!

If you are still reading, external-dns may sound enticing to you. To get started, you will first need to grab the deployment manifests for your DNS provider. In my case, I am using route53 so I installed the RBAC manifests after reviewing them. External-dns also needs to make API calls to your cloud-provider to list and update DNS records. This will require permissions which the installation guides cover in-depth.

External-dns is highly configurable, and has numerous flags to control how it will manage DNS updates. I would suggest ammending the defaults based on your risk tolerance and operational practices. At a minimum, I would suggest reviewing the following flags:

--aws-prefer-cname          - Create an ALIAS or CNAME record in Route53.
--namespace=""              - Limit the namespaces external-dns looks for annotations in.
--publish-internal-services - Publish DNS for ClusterIP services.
--provider=aws              - The provider you need external-dns to work with.
--policy=upsert-only        - Controls how records are synchronized.
--txt-owner-id="default"    - The name to assign to this external-dns instance.
--txt-prefix=""             - Custom string appended to each DNS record ownership record.
--domain-filter=domains     - The list of domains external-dns should operate on.

One flag to think through is “–policy”. This controls how external-dns will manage records when services are added and removed. External-dns has three modes of operation: sync, upsert-only, and create-only. These are described in policy.go:

Policies is a registry of available policies.
var Policies = map[string]Policy{
 "sync":        &SyncPolicy{},
   "upsert-only": &UpsertOnlyPolicy{},
     "create-only": &CreateOnlyPolicy{},
     }

     // SyncPolicy allows for full synchronization of DNS records.
     type SyncPolicy struct{}

     // Apply applies the sync policy which returns the set of changes as is.
     func (p *SyncPolicy) Apply(changes *Changes) *Changes {
       return changes
       }

       // UpsertOnlyPolicy allows everything but deleting DNS records.
       type UpsertOnlyPolicy struct{}

I was concerned with entries being removed when I first started using external-dns, so I wanted to point this out (FWIW: a working backup and recovery solution eased my fears). To get external-dns to create DNS records for your service, you need to add an “external-dns.alpha.kubernetes.io/hostname” annotation with the DNS entry to create. Here is an example annotated service which will trigger the creation of matty.prefetch.net:

apiVersion: v1
kind: Service
metadata:
  name: matty-service
  annotations:
    external-dns.alpha.kubernetes.io/hostname: matty.prefetch.net
spec:
  selector:
    run: nginx-matty
  ports:
    - port: 80
      targetPort: 80
  type: LoadBalancer

After the DNS entry is created, the external-dns pod will log a message to indicate the record was created:

$ kubectl logs external-dns-XXXXX

time="2020-01-27T20:28:35Z" level=info msg="Desired change: CREATE matty.prefetch.net A [Id: /hostedzone/XXXXXXXXXXX]"
time="2020-01-27T20:28:35Z" level=info msg="Desired change: CREATE matty.prefetch.net TXT [Id: /hostedzone/XXXXXXXXXXXX]"
time="2020-01-27T20:28:35Z" level=info msg="2 record(s) in zone prefetch.net. [Id: /hostedzone/XXXXXXXXX] were successfully updated"

In the example above, external-dns created a Route53 ALIAS record pointing matty.prefetch.net to the ALB DNS name returned by the cloud provider. It also created a TXT ownership record to indicate external-dns owns the entry:

$ dig +short matty.prefetch.net txt

"heritage=external-dns,external-dns/owner=my-hostedzone-identifier,external-dns/resource=service/default/matty-service"

To verify the entry resolves you can run dig:

$ dig +short matty.prefetch.net

34.204.233.20
52.87.68.17

The IPs returned should be the same ones returned if you resolve the load balancer DNS name:

$ dig +short XXXXXXXXXXXXXXXXXXXXXXXX-XXXXXXXXXX.us-east-1.elb.amazonaws.com

34.204.233.20
52.87.68.17

Now the big question! Would I run this in production? I’m not sure yet. Currently, I’m using it to provision minimized EKS clusters for developers, and that is working well. There are some large organizations using it, but there are a few GitHub issues that concern me. Once I get a bit more comfortable with it, I won’t hesitate using it in production. The code is readable, well organized, and the community is active. Those are always good signs!

Notes from episode 70 of TGIK: Assuming AWS roles with kube2iam/kiam

This article was posted by on 2020-01-27 01:00:00 -0500 -0500

Over the past few months I’ve been trying to learn everything there is to know about Kubernetes. Kubernetes is an amazing technology for deploying and scaling containers though it comes with a cost. It’s an incredibly complex piece of software and there are a ton of bells and whistles to become familiar with. One way that I’ve found for coming up to speed is Joe Beda’s weekly TGIK live broadcast. This occurs each Friday at 4PM EST and is CHOCK full of fantastic information. In episode seventy Joe discusses KIAM and Kube2IAM. You can watch it here :

Here are some of my takeways from the episode:

The AWS metadata server exposes various attributes about the VM.
The metadata server can be reached from the VM on the link local address http://169.254.169.254:
$ curl http://169.254.169.254
You can get your current identity with the aws simple token service get-caller-indentity option:
$ aws sts get-caller-identity
AWS roles can be assumed by a service or user via sts:AssumeRole.
AWS vault allows you to access credentials during development https://github.com/99designs/aws-vault
Roles contain two policies. Once defines who can assume it and the other contains what the policy can do.
Trust relationships define who can assume a role
The “Principal” in the trust relationship contains the list of ARNs that can assume this role.
You can assume a role with the aws iam assume-role –role-arn arn://…../foo/bar –role-session-name foobar
Kube2IAM intercepts calls to the metdata API server and proxies them to AWS.
One downside to kube2iam is the need to attach every possible role to every worker.
Kiam runs as a client / server. Server doles out roles and agent requests them.
Kiam also works by interposing itself between the pod and the metadata server.

Verifying your .gitignore is working correctly

This article was posted by on 2020-01-27 00:00:00 -0500 -0500

I was recently cleaning up an old Git repo, and noticed that some .pyc files got checked in. This got me thinking, and I started reading through the Git documentation to see if there was a way to evaluate .gitignore rules to make sure they were working as expected. Sure enough, Git has the “check-ignore” command. Given the following .gitignore:

$ cat .gitignore

*.env
cluster*
*.pyc

You can pass a pattern to “check-ignore” to get the list of files in your working directory that match the expression:

$ git check-ignore cluster*

cluster1
cluster1.env

$ git check-ignore *.env

cluster1.env

Super handy! After further review, the issue turned out to be a typo in the .gitignore.

Observing Kubernetes kubectl API calls

This article was posted by on 2020-01-26 00:00:00 -0500 -0500

Recently I spent some time digging into the Kubernetes API. This was an incredible experience, and it really helped me understand the various calls, how they are structured, and what they do. To observe the API calls made by kubectl, you can run it with the “-v10” option:

$ kubectl get po -v10

This will print a TON of information to your screen. To see the API calls generated by $(kubectl get po), you can grep the results for GET:

$ kubectl get po -v10 2>&1 | grep GET

I0126 12:43:18.308163   28626 round_trippers.go:443] GET https://FQDN/api/v1/namespaces/default/pods?limit=500 200 OK in 1077 milliseconds

The API call to retrieve the list of pods contains the API version, the namespace to retrieve pods from, and the the results are paginated to 500 by default. What I personally found super useful was studying the JSON objects returned by the API server. The following command will pretty print the JSON responses:

$ kubectl get po -v10 2>&1 | grep 'Response Body:' out | sed 's/I0126.*Body://' | jq '.' | more

{
  "kind": "Table",
  "apiVersion": "meta.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/cert-manager/pods/cert-manager-7f46f4ffdd-bkz5f",
    "resourceVersion": "1127"
  },
  "columnDefinitions": [
    {
      "name": "Name",
      "type": "string",
      "format": "name",
      "description": "Name must be unique within a namespace ...
      "priority": 0
    },
  ...

Studying the responses to specific API calls and looking up the various fields has really helped me understand what is going on under the covers. If you want to learn more you should check out Making the Most Out of Kubernetes Audit Logs, as well as Duffie Cooley’s Grokking the Kubernetes API server series. When you need to debug weird issues, you will be glad you did!

Linting Jenkinsfiles to find syntax errors

This article was posted by on 2019-12-18 21:47:16 -0500 -0500

As a long time Jenkins user I periodically need to add new steps or Groovy logic to my Jenkinsfiles. The last thing you want to do when updating your pipeline configuration is to make a typo which causes a build to break. To avoid these scenarios, I like to use a git pre-commit hook along with the Jenkins CLI “declarative-linter” option. To use this super useful feature to check for syntax errors, you will first need to download the Jenkins CLI client. You can do this with wget:

$ wget -O jenkins-cli.jar https://JENKINS_SERVER:JENKINS_PORT/jnlpJars/jenkins-cli.jar

One the Java archive is installed, you can use the following syntax to check if a Jenkinsfile is structurally sound:

$ export API_TOKEN="RANDOM_FOO"

$ java -jar jenkins-cli.jar -auth ${LIMITED_PERM_USER}:${API_TOKEN} -s http://JENKINS_SERVER:JENKINS_PORT declarative-linter < Jenkinsfile

Dec 19, 2019 9:33:30 AM org.apache.sshd.common.util.security.AbstractSecurityProviderRegistrar getOrCreateProvider
INFO: getOrCreateProvider(EdDSA) created instance of net.i2p.crypto.eddsa.EdDSASecurityProvider
Jenkinsfile successfully validated.

$ echo $?

If the file looks good you will get a return code of 0 and the string “Jenkinsfile successfully validated” will be printed on the console. If you fat fingered something (e.g., left out a parentheis or semicolon), you will get a return code of 1 and the string “Errors encountered validating Jenkinsfile:” will be printed:

$ java -jar jenkins-cli.jar -auth ${LIMITED_PERM_USER}:${API_TOKEN} -s http://JENKINS_SERVER:JENKINS_PORT declarative-linter < Jenkinsfile

Dec 19, 2019 9:33:40 AM org.apache.sshd.common.util.security.AbstractSecurityProviderRegistrar getOrCreateProvider
INFO: getOrCreateProvider(EdDSA) created instance of net.i2p.crypto.eddsa.EdDSASecurityProvider
Errors encountered validating Jenkinsfile:
WorkflowScript: 39: unexpected token: } @ line 39, column 1.
   }
   ^

$ echo $?

In the case of an error, the linter will give you a breadcrumb to help you track down the issue. One important item to remember is that this checks the structure of the Jenksinfiles passed to STDIN. It won’t pick up logic errors in your Groovy code or the incorrect use of steps in your stages. But as a first line of defense it works pretty well. It also ensures you won’t be “the guy” that gets asked about TPS reports when your co-workers joke about the build being broken at the water cooler.