Using Ansible to verify remote file checksums with get_url, lookup() and stat


Being an extremely security minded operations guy I take every precaution to verify that the files I download are legit. In this day and age of servers and data getting compromised this should be an operational standard. There are numerous ways to verify checksums. You can use openssl’s various hashing options or a simple wrapper script similar to this. I prefer to automate everything so I typically offload these types of tasks to ansible and chef. Both configuration management systems give you a number of ways to tackle this and I thought I would discuss a couple of my favorite ansible recipes in this blog post.

To illustrate how easy this process is with ansible lets say you are setting up an elasticsearch cluster and want to keep up to date with the latest GEO IP database from maxmind.net. To retrieve the database and verify the checksum you can use the ansible get_url module with the checksum parameter and lookup filter:

---
- hosts: localhost
  connection: local
  vars:
    geoip_dir: "/tmp/geoip"
    geoip_db_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz"
    geoip_db_md5sum_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz.md5"
    geoip_db_compressed_file_name: "{{ geoip_db_url | basename }}"
    geoip_db_md5sum: "md5: {{ lookup('url', geoip_db_md5sum_url) }}"
  gather_facts: false
  tasks:
     - name: Create geoip directory if it doesn't exist
       file:
         path: "{{ geoip_dir }}"
         state: directory
         mode: 0700

     - name: "Downloading the latest GeoIP and comparing it to checksum {{ geoip_db_md5sum }}"
       get_url:
         url: "{{ geoip_db_url }}"
         dest: "{{ geoip_dir }}/{{ geoip_db_compressed_file_name }}"
         mode: 0600
         checksum: "{{ geoip_db_md5sum }}"

In the example above the lookup() filter will retrieve the MD5SUM from a remote file and assign that to the checksum parameter passed to get_url. If the remote file checkum matches the value of geoip_db_md5sum the file will be downloaded to the directory specified in the dest parameter. This is useful to show how versatile ansible is but the security conscious admin should be bugging out about retrieving a payload and checksum from the same server. Taking this a step further lets say you retrieved the checksum from a secure source and assigned it to the variable geoip_db_md5sum. This variable can then be referenced by the get_url checksum parameter:

---
- hosts: localhost
  connection: local
  vars:
    geoip_dir: "/tmp/geoip"
    geoip_db_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz"
    geoip_db_md5sum_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz.md5"
    geoip_db_compressed_file_name: "{{ geoip_db_url | basename }}"
    geoip_db_uncompressed_file_name: "{{ geoip_db_url | basename | replace('.tar.gz','')}}"
    geoip_db_md5sum: "md5: ca82582c02c4a4e57ec9d23a97adaa72"
  gather_facts: false
  tasks:
     - name: Create geoip directory if it doesn't exist
       file:
         path: "{{ geoip_dir }}"
         state: directory
         mode: 0700

     - name: "Downloading the latest GeoIP file"
       get_url:
         url: "{{ geoip_db_url }}"
         dest: "{{ geoip_dir }}/{{ geoip_db_compressed_file_name }}"
         checksum: "{{ geoip_db_md5sum }}"
         mode: 0600

Simple, elegant, but wait, there’s more! Ansible also has a stat module which you can use to retrieve the checksum of a file. This can be combined with get_url to achieve the same result (this isn’t the ideal way to solve this problem but is included to show the power of ansible):

---
- hosts: localhost
  connection: local
  vars:
    geoip_dir: "/tmp/geoip"
    geoip_db_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz"
    geoip_db_md5sum_url: "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz.md5"
    geoip_db_compressed_file_name: "{{ geoip_db_url | basename }}"
    geoip_db_uncompressed_file_name: "{{ geoip_db_url | basename | replace('.tar.gz','')}}"
    geoip_db_md5sum: "ca82582c02c4a4e57ec9d23a97adaa72"
  gather_facts: false
  tasks:
     - name: Create geoip directory if it doesn't exist
       file:
         path: "{{ geoip_dir }}"
         state: directory
         mode: 0700

     - name: "Downloading the latest GeoIP file"
       get_url:
         url: "{{ geoip_db_url }}"
         dest: "{{ geoip_dir }}/{{ geoip_db_compressed_file_name }}"
         mode: 0600

     - name: "Checking {{ geoip_db_compressed_file_name }} against checksum {{ geoip_db_md5sum }}"
       stat:
          path: "{{ geoip_dir }}/{{ geoip_db_compressed_file_name }}"
       register: st

     - name: extract the geo IP database
       unarchive:
         src: "{{ geoip_dir }}/{{ geoip_db_compressed_file_name }}"
         dest: "{{ geoip_dir }}"
       when: geoip_db_md5sum == st.stat.md5

In the example above the GEO IP database is downloaded and then the stat module is called to retrieve a variety of file metadata. One piece of metadata is the md5 parameter which contains an MD5 hash of the file passed as an argument to path. This value is compared against the value stored in geoip_db_md5sum and if the comparison succeeds the module (unarchive in this case) will run. Ansible supports a number of hash algorithms (SHA256, SHA512, etc.) which should be used in place of MD5 if you have the ability to do so. Gotta loves you some Ansible!

This article was posted by Matty on 2017-08-27 14:13:00 -0400 -0400