Automating cron jobs with the ansible cron module

Over the past month I have been rewriting some cron scripts to enhance monitoring and observability. I’ve also been refactoring my ansible
playbooks to handle deploying these scripts in a consistent fashion. Ansible ships with the cron module which makes this process a breeze.
The cron module has all of the familiar cron attributes (hour, minute, second, program to run, etc.) and takes the following form:

- name: Cron job to prune old elasticsearch indexes
  cron:
    name: cleanup-elasticsearch-indexes
    minute: 0
    hour: 0
    job: /scripts/curator/curator_clean_logs.sh
    state: present
    user: curator

When I first played around with this module I noticed that each playbook run would result in a cron entry being added. So instead of getting one curator log cleanup job when the play is executed I would get a one entry per run. This is obviously very bad. When I read back through the cron module documentation I came across this little nugget for the “name” parameter:

Note that if name is not set and state=present, then a new crontab entry will always be created, regardless of existing ones.

Ansible uses the name to tag the entry and if the tag already exists a new cron job won’t be added to the system (in case your interested this is implemented by the find_job() method in cron.py). Small subtleties like this really bring to light the importance of a robust test environment. I am currently using vagrant to solve this problem but there are also a number of solutions documented in the Ansible testing strategies guide.

Using xargs and lscpu to spawn one process per CPU core

One of my friends reached out to me earlier this week to ask if there was an easy way to run multiple Linux processes in parallel. There are
several ways to approach this problem but most of them don’t take into account hardware cores and threads. My preferred solution for CPU intensive operations is to use the xargs parallel option (“-P”) along with the CPU cores listed in lscpu. This allows me to run one process per core which is ideal for CPU intensive applications. But enough talk, let’s see an example.

Let’s say you need to compress a directory full of log files and want to run one compression job on each CPU core. To locate the number of cores you can combine lscpu and grep:

$ CPU_CORES=`lscpu -p=CORE,ONLINE | grep -c ‘Y’`

To generate a list of files we can run find and pass the output of that to xargs:

$ find . -type f -name \*.log | xargs -n1 -P${CPU_CORES} bzip2

The xargs command listed above will create one bzip2 process per core and pass it a log file to process. To monitor the pipeline to make sure it is working as intended we can run a simple while loop:

$ while :; do ps auxwww | grep [b]zip; sleep 1; done

matty    14322  0.0  0.0 113968  1228 pts/0    S+   07:24   0:00 xargs -n1 -P4 bzip2
matty    14323 95.0  0.0  13748  7624 pts/0    R+   07:24   0:11 bzip2 ./log10.txt.log
matty    14324 95.9  0.0  13748  7616 pts/0    R+   07:24   0:11 bzip2 ./log2.txt.log
matty    14325 96.0  0.0  13748  7664 pts/0    R+   07:24   0:11 bzip2 ./log3.txt.log
matty    14326 94.9  0.0  13748  7632 pts/0    R+   07:24   0:11 bzip2 ./log4.txt.log

There are a number of other useful things you can do with the items listed above but I will leave that to your imagination. Viva la xargs!

Using the python SMTP DebuggingServer to test SMTP communications

During the development of the dns-domain-expiration-checker script I needed a way to test SMTP mail delivery w/o relying on an actual mail exchanger. While reading through the smtplib and snmpd documentation I came across the SMTP debugging server. This nifty module allows you to run a local mail relay which will print the messages it receives to standard out. To enable it you can load the smtpd module and instruct it to run the DebuggingServer command on the IP and port passed as arguments:

$ python -m smtpd -c DebuggingServer -n localhost:8025

This will fire up a local mail server on localhost:8025 and each messaged received will be printed to STDOUT. If I run my DNS domain expiration script and point it to localhost:8025:

$ dns-domain-expiration-checker.py –domainname prefetch.net –email –expiredays 2000 –smtpserver localhost –smtpport 8025 –email

The debugging server prints the SMTP headers and body each time the script generates an e-mail:

$ python -m smtpd -c DebuggingServer -n localhost:8025

---------- MESSAGE FOLLOWS ----------
Content-Type: multipart/mixed; boundary="===============3200155514135298957=="
MIME-Version: 1.0
From: root
To: root
Subject: The DNS Domain prefetch.net is set to expire in 1041 days
X-Peer: 127.0.0.1

--===============3200155514135298957==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

Time to renew prefetch.net
--===============3200155514135298957==--
------------ END MESSAGE ------------

Super useful module for troubleshooting SMTP and e-mail communications!

Normalizing date strings with the Python dateutil module

I recently attended a Python workshop with Jeff Cohen and he answered a ton of random questions from students. One student mentioned that he was overwhelmed with the number of Python modules and Jeff told us that he has evolved his Python skills by learning at least one new module each week. I’ve started doing this as well and it’s been a HUGE help. Each Sunday I find a module I’m not familiar with in PiPY or the standard library and read through the documents. Then I do a number of coding exercises to see how the module works. Once I’m comfortable using it I try to read through the source code to see how it’s implemented under the covers. The last part is time consuming but it’s a great way to really understand how the module works.

While perusing the date modules last weekend I came across dateutil. This handy little module provides a built-in parser to take arbitrary dates and normalize them into a datetime object. If you are dealing with different data sources without a common set of formatting standards you will love this little guy! To see how this works say you have two dates and need to get the number of days between them. The following snippet does this.

>>> import dateutil.parser

>>> date1 = "2020-06-23T16:56:05Z"

>>> date2 = "June 22 2018 09:23:45"

>>> d1 = dateutil.parser.parse(date1, ignoretz=True)

>>> d2 = dateutil.parser.parse(date2, ignoretz=True)

>>> print (d1-d2).days
732

If you need higher resolution you can use the min, seconds, total_seconds and microseconds attributes to drill down further. This useful module made rewriting dns-domain-expiration-checker.py a breeze!

Monitoring DNS domain name expiration with dns-domain-expiration-checker

Several years ago I wrote a simple bash script to check the expiration date of the DNS domains I own. At the time I wrote this purely for my own needs but after receiving 100s of e-mails from folks who were using it (and submitting patches) I decided to enhance it to be more useful. As time has gone on Registrar WHOIS data formats have changed and I came to the realization that there is no standard time format for expiration records. I needed a more suitable solution so I spent last weekend re-writing my original script in Python. The new version is available on github and solves all of the issues I previously encountered.

To work around the Registrar WHOIS data format issue I created a list of strings which can be easily extended as new ones are encountered (or changed):

EXPIRE_STRINGS = [ "Registry Expiry Date:",
                   "Expiration:",
                   "Domain Expiration Date"
                 ]

These strings are checked against the WHOIS data returned from `which whois` and if a match is found I save off $NF which should be the date. To get around the date formatting issues I pulled in the dateutil module which does an AMAZING job of normalizing dates. This module allows me to feed it random date formats which are then normalized to datetime objects which I can perform math on. The github README contains several examples showing how to use the script. The most basic form allows you to see expiration data for a domain or a set of domains in a file:

$ dns-domain-expiration-checker.py --domainname prefetch.net --interactive
Domain Name                Registrar             Expiration Date                 Days Left
prefetch.net               DNC Holdings, Inc.    2020-06-23 16:56:05             1056

The script also provides SMTP alerting and Nagios support will be added in the near future. If you use the script and encounter any bugs please shoot me an issue on github.

Using awk character classes to simplify parsing complex strings

This week I was reading a shell script in a github repository to see if it would be good candidate to automate a task. As I was digging through the code I noticed a lengthy shell pipeline to parse a string similar to this:

Thu Jul 20 18:13:04 EDT 2017 snarble foo bar (gorp): blatch (fmep): gak+

Here is the code she/he was using to extract the string “gorp”:

$ cat /foo/bar.txt | grep “snarble” | awk ‘{print $10}’ | awk -F'(‘ ‘{print $2}’ | awk -F’)’ ‘{print $1}’

After my eyes recovered I thought this would be a good candidate to simplify with awk character classes. These are incredibly useful for applying numerous field separators to a given line of input. I took what the original author had and simplified it to this:

$ awk -F'[()]+’ ‘/snarble/ {print $2}’ /foo/bar.txt

The argument passed to the field separated option (-F) contains a list of characters to use as delimiters. The string inside the slashes are used to match all lines that contain the word snarble. I find the second a bit easier to read and character classes are a super useful!