Exporting Bind query statistics though XML and JSON

Bind 9.10 introduced a statistics server which exports a number of useful metrics through a web UI, XML and JSON. The statistics server is configured via the “statistics-channels” directive which contains the ip and port to export statistics on and an ACL to control who can read statistics from the server. Here is a sample configuration for reference:

acl "stats_hosts" {
        192.168.1.0/24;
};

statistics-channels { 
        inet 10.10.0.1  port 8080 allow { stats_hosts; }; 
};

Once the statistics server is enabled you can view the statistics in a web browser by surfing to the IP:PORT the server is configured to export statistics through. To retrieve statistics through XML or JSON you can append “/xml” or “/json” to the URL:

# Retrieve statistics through XML
$ curl http://bind:8080/xml

# Retrieve statistics through JSON
$ curl http://bind:8080/json

The statistics server exports several useful metrics. To view everything you can pipe the output of curl to jq:

$ curl -j http://bind:8080/json 2>/dev/null | jq '.' | more
{
  "json-stats-version": "1.2",
  "boot-time": "2017-09-10T13:24:35.411Z",
  "config-time": "2017-09-10T13:24:35.484Z",
  "current-time": "2017-09-10T13:35:44.401Z",
  "version": "9.11.2",
  "opcodes": {
    "QUERY": 389,
    "IQUERY": 0,
    "STATUS": 0,
    .....

If you want to get specific fields you can can adjust the filter passed to jq. To get just the query response codes you can retrieve the rcodes field:

$ curl -j http://bind:8080/json 2>/dev/null | jq '.rcodes'
{
  "NOERROR": 307,
  "FORMERR": 0,
  "SERVFAIL": 0,
  "NXDOMAIN": 0,
  "NOTIMP": 0,
  "REFUSED": 0,
  .....
}

To get the types of queries sent to the server you can retrieve qtypes:

$ curl -j http://bind:8080/json 2>/dev/null | jq '.qtypes'
{
  "A": 369,
  "NS": 1,
  "PTR": 1,
  "MX": 1,
  "AAAA": 11
}

To get overall name server statistics you can grab nsstats:

$ curl -j http://bind:8080/json 2>/dev/null | jq '.nsstats'
{
  "Requestv4": 385,
  "ReqEdns0": 361,
  "RecQryRej": 8,
  "Response": 385,
  "RespEDNS0": 361,
  "QrySuccess": 369,
  "QryAuthAns": 17,
  "QryNoauthAns": 360,
  "QryNxrrset": 8,
  "QryRecursion": 3,
  "QryFailure": 8,
  "QryUDP": 377
}

The statistics server also exports zone data and zone, network and memory statistics. Funneling this data into metricbeats or prometheus and using kibana and grafana to visualize it can provide some amazing insight into your DNS infrastructure.

Using elastic’s metricbeat to collect system utilization data

Over the past month I’ve been evaluating metricbeat. Metricbeat along with the ELK stack are incredibly powerful tools for deriving meaning from metrics and unstructured log data. Metricbeats allows you to funnel system and application metrics (e.g., CPU utilization, number of HTTP GET requests, number of SQL queries, HTTP endpoint response times, etc.) into elasticsearch and the powerful kibana visualization tool can then be used to make sense of it.

To get up and running with metricbeats you will first need to configure your logstash infrastructure to support incoming beats. Once logstash is accepting beats you will need to install the metricbeats daemon on each system you want to collect metrics from. To configure metricbeats you will need to modify the YAML file in /etc/metricbeat/metricbeat.yml.

The first section in metricbeat.yml tells metricbeat which metricsets to collect. Currently there are metricsets for load average, CPU, disk, memory, network and process utilization. To enable a metricset you need to make sure the metric isn’t commented out. The following snippet tells metricbeat to collect CPU, memory and network metrics every 10 seconds:

- module: system
  metricsets:
    # CPU stats
    - cpu
    # Memory stats
    - memory
    # Network stats
    - network
  enabled: true
  period: 10s
  processes: ['.*']

Getting the collection period correct is definitely an art. Collecting metrics too frequently will increase system load and can skew the meaning of the metrics you are collecting. Not sampling data often enough can hide short lived problems. You will definitely need to experiment to find the collection interval that is optimal for your environment.

The next section in the file contains one or more outputs. These control where metrics are sent. Metrics can be sent directly to elasticsearch if you don’t need to do any processing. You can also route them to logstash and apply one or more filters to the metrics prior to placing them in an elasticsearch index. The following snippet shows how to ship metrics to elasticsearch over SSL with authentication:

output.elasticsearch:
  hosts: ["https://elastic.my.domain:9200"]
  username: "metricdata"
  password: "WOULDNTYOULIKETOKNOW"
  index: "metricbeat"
  ssl.certificate_authorities: ["/elk/certs/ca.pem"]
  ssl.certificate: "/elk/certs/cert.pem"
  ssl.key: "/elk/certs/cert.key"

Once the configuration is in place you can start metricbeat with systemctl:

$ systemctl enable metricbeat && systemctl start metricbeat

If the daemon starts up you will see metric data in the metricbeat index (assuming this is the index you are using for beat data) of the Kibana display. In the next couple of posts I’ll show some of the visualizations I’ve used to track down some really weird problems. It’s AMAZING how easy it is to find issues once all of your metric data is in a single location.

Install metricbeats with ansible and the elastic yum repository

Last month I started playing with elastic’s metricbeat and you can say I fell in love at first beat. I’ve created some amazing visualizations with the metrics it produces and am blown away by how much visibility I can get from correlating disparate event streams. A good example of this is being able to see VMware hypervisor utilization, system utilization and HTTP endpoint latency stacked on top of each other. Elastic hosts a yum metricbeat repository and it’s easy to deploy it to Fedora-derived servers with ansible’s templating capabilities and the yum_repository module.

The following tasks from my metricbeat role will create a metricbeat yum repository configuration file, install the metricbeat package, deploy a templated metricbeat configuration file then enable and start the service:

---
# tasks file for metricbeat
- name: Add metricbeat repository
  yum_repository:
    name: metricbeat
    description: Beats Repo
    baseurl: https://artifacts.elastic.co/packages/5.x/yum
    gpgkey: https://packages.elastic.co/GPG-KEY-elasticsearch
    gpgcheck: yes
    enabled: yes
    owner: root
    group: root
    state: present
    mode: 0600

- name: Install metricbeat package
  package:
    pkg={{item}}
    state=installed
  with_items:
    - metricbeat

- name: Create metricbeat configuration file
  template:
    src: metricbeat.yml.j2
    dest: /etc/metricbeat/metricbeat.yml
    owner: root
    group: root
    mode: 0644

- name: Enable the metricbeat systemd service
  systemd:
    name: metricbeat
    enabled: yes
    state: started
    daemon_reload: yes

I’ve simplified the example to illustrate how easy it is to get up and running with metricbeat. Error handling and conditional restart logic are missing from the example above.

Using ansible’s templating capabilities to deliver a keepalived configuration file

I’ve become a huge fan of ansible’s templating capabilities over the past few months. If you haven’t used them they allow you to control the content of a file that is delivered to a system. The templates can contain variable names which get filled in with well known values, you can use math operations and various filters to derive values, and these can all be wrapped in logic statements to control when and where this occurs.

To illustrate this lets say we are looking to stand up a fault tolerant haproxy cluster and want to use keepalived to control the virtual IPs that float between servers. You could create one configuration file per server and then push these to the appropriate server through the copy module. This violates the duplication anti-pattern and adds more maintenance over the long term. A better approach would be to create one configuration file and fill it in with variables that are unique to each server. These unique variables could be the name of the host, its primary interface, the EC2 auto scaling group, etc. Ansible makes this crazy easy with templates.

Ansible templates are built on top of the amazing Jinja2 templating language. The language allows you to do things like format data, perform math and set operations, calculate random values, fill in variables if a logic operation succeeds etc. I won’t go into any additional detail on the language since the official ansible and Jinja2 documentation are solid!

Now back to the keepalived example. Lets say I want to create a unique keepalived.conf on each server. The ansible template module can take a template file we created, process it with Jinja2 and then spit out a unique configuration file on each server. A template module task takes the following basic form:

- name: Create keepalived configuration file
  template:
    src: keepalived.conf.j2
    dest: /etc/keepalived/keepalived.conf
    owner: root
    group: root
    mode: 0600

In this example the template file named keepalived.conf.j2 will be processed and a file named /etc/keepalived/keepalived.conf will be created on the server or servers this play was run against. The keepalived.conf.j2 file is a standard configuration file which contains variables (enclosed in mustaches) and logic (e.g., for foo in bar …). To fill in the keepalived.conf global_defs section we can create a couple of variables to define well known values (this gets powerful when you define variables in one place and use them throughout your roles and playbooks):

keepalived_email_to: root
keepalived_email_from: root
keepalived_smtp_server: localhost

These can be combined with the well know ansible_fqdn variable to give something similar to this:

global_defs {

   # Send an e-mail to each of the following
   # addresses when a failure occurs
   notification_email {
       {{ keepalived_email_to }}
   }
   # The address to use in the From: header
   notification_email_from {{ keepalived_email_from }}

   # The SMTP server to route mail through
   smtp_server {{ keepalived_smtp_server }}

   # How long to wait for the mail server to respond
   smtp_connect_timeout 30

   # A descriptive name describing the router
   router_id vrrp-director-{{ ansible_fqdn }}
}

If this play was run against a server named haproxy01 we would get the following global configuration:

global_defs {

   # Send an e-mail to each of the following
   # addresses when a failure occurs
   notification_email {
       root
   }
   # The address to use in the From: header
   notification_email_from root

   # The SMTP server to route mail through
   smtp_server localhost

   # How long to wait for the mail server to respond
   smtp_connect_timeout 30

   # A descriptive name describing the router
   router_id vrrp-director-haproxy01
}

That’s handy, and allows you to create a single configuration file with unique values for each system. To continue on with our HA keepalived setup lets say it needs to manage two virtual IP addresses and we want each server to master one IP address. Once again we could hard code the values in multiple configuration files or we can use a bit of logic to create unique vrrp_instances for each server. The following snippet shows an example of this:

{% for ip_address in vars['keepalived_virtual_ipaddresses'] %}
# Create a VRRP instance
vrrp_instance vrrp-director-{{ ansible_fqdn }} {
    # The initial state to transition to. This option isn't
    # really all that valuable, since an election will occur
    # and the host with the highest priority will become
    # the master. The priority is controlled with the priority
    # configuration directive.
    state MASTER

    # The interface keepalived will manage
    interface {{ ansible_default_ipv4.interface }}

{% set router_id = keepalived_initial_router_id + loop.index %}
    # The virtual router id number to assign the routers to
    virtual_router_id {{ router_id }}

{% set node1 = groups["haproxyservers"][0:1] | join(" ") %}
{% set node2 = groups["haproxyservers"][1:2] | join(" ") %}

{% if loop.index % 2 %}
  {% if inventory_hostname == node1 %}
    {% set priority = 1 %}
  {% else %}
    {% set priority = 2 %}
  {% endif %}
{% else %}
  {% if inventory_hostname == node2 %}
    {% set priority = 1 %}
  {% else %}
    {% set priority = 2 %}
  {% endif %}


    # The priority to assign to this device. This controls
    # who will become the MASTER and BACKUP for a given
    # VRRP instance.
    priority {{ priority }}

    # How many seconds to wait until a gratuitous arp is sent
    garp_master_delay 10

    # How often to send out VRRP advertisements
    advert_int 1

    # Execute a notification script when a host transitions to
    # MASTER or BACKUP, or when a fault occurs. The arguments
    # passed to the script are:
    #  $1 - "GROUP"|"INSTANCE"
    #  $2 = name of group or instance
    #  $3 = target state of transition
    # Sample: VRRP-notification.sh VRRP_ROUTER1 BACKUP 100
    # notify "/usr/local/bin/VRRP-notification.sh"

    # Send an SMTP alert during a state transition
    smtp_alert

    # Authenticate the remote endpoints via a simple 
    # username/password combination
    authentication {
        auth_type AH
        auth_pass {{ keepalived_auth_key }}
    }
    # The virtual IP addresses to float between nodes. The
    # label statement can be used to bring an interface 
    # online to represent the virtual IP.
    virtual_ipaddress {
        {{ ip_address }}/32 dev {{ ansible_default_ipv4.interface }}
    }
}
{% endfor %}

In the output above I am iterating over one of more IPs defined in the keepalived_virtual_ipaddresses variable and building a unique vrrp_instance stanza for each one. The physical interfaces are assigned based on the value of the well known ansible_default_ipv4.interface variable, the virtual_router_id is assigned dynamically for each stanza and the priority value (this controls who owns the IP initially) is generated on the fly based on a modulus operation. I’m still learning everything there is to know about Jinja2 and I’m sure I will refactor this in a couple of months once I come across a better way to do this. This blog post is more of a reference to myself than anything else.

Getting the ansible yum module to work on Fedora servers

I was doing some testing this morning on a Fedora 25 host and received the following error when I tried to execute a playbook:

$ ansible-playbook –ask-become-pass -l tbone playbooks/system-base.yml

PLAY [all] ********************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************
ok: [tbone]

TASK [system-common : upgrade all packages] ***********************************************************************
fatal: [tbone]: FAILED! => {"changed": false, "failed": true, "msg": "python2 yum module is needed for this  module"}
	to retry, use: --limit @/ansible/playbooks/system-base.retry

PLAY RECAP ********************************************************************************************************
tbone        : ok=1    changed=0    unreachable=0    failed=1   

To see what ansible was doing I set the ANSIBLE_KEEP_REMOTE_FILES environment variable which keeps the ansiballz modules on the remote host (this is super useful for debugging problems). After reviewing the files in the temporary task directory I noticed that the playbook had a task to install a specific version of a package with yum. Yum doesn’t exist on newer Fedora releases hence the “python2 yum module” error.

There are a couple of ways to fix this. The ideal way is to use the package module (or check the OS release and use the ansible dnf module instead of yum) instead of specifying yum or dnf. If you need a quick fix you can shell out from your playbook and install python2-dnf prior to gathering facts. If you need a quicker fix you can install the package by hand:

$ yum -y install python2-dnf

I’m currently using the package module and it works like a champ.

Viewing ansible variables

When developing ansible playbooks and roles it’s extremely useful to be able to see all of the variables available to you. This is super easy with the ansible setup and debug modules:

# List all of the vars available to the host
$ ansible haproxy01.* -m setup

# Retrieve all of the groups from the inventory file
$ ansible haproxy01.* -m debug -a "var=groups"

Lester Wade took this a step further and wrote a great blog entry that describes how to dump the contents of the vars, environment, group_names, hostvars and group variables to a file. If you run his example you will get a nicely formatted text file in /tmp/ansible.all

Module Variables ("vars"):
--------------------------------
{
    "ansible_all_ipv4_addresses": [
        "192.168.1.122",
        "192.168.1.124"
    ],
    "ansible_all_ipv6_addresses": [
        "fe80::250:56ff:fe8f:b8ad"
    ],
    "ansible_apparmor": {
        "status": "disabled"
    },

This file is a great reference and kudos to Lester for the amazing work!