I’ve been around Linux and UNIX for quite some time and one thing that has always piqued my interests is debugging broken software. Bryan Cantrill made some excellent points on why postmortem debugging is needed at DOCKERCON and the following video is a must watch:
His points on restarting a broken container w/o root causing the source of the failure is SPOT ON! I also love his mad cow analogy. I’ve had the same mind set since I started managing infrastructure and I find the whole root cause process exciting and fun. Who doesn’t love looking at backtraces, registers and memory on the stack and heap?!!?! Most admins I’ve met like debugging but they dread seeing the following:
$ ./badproc
Segmentation fault (core dumped) ./badproc
I on the other hand start to drool when a piece of software I manage (but didn’t write) encounters a fatal condition that leads to its demise. If I can’t locate a bug report with a fix I’ll grab a cup of coffee, ensure debugging symbols are present and fire up gdb to root cause the failure. My first experience root causing a segmentation violation was with snort. This was an extremely valuable learning experience and at the time the Internet had limited resources explaining stack layouts, memory organization and how gdb can be used to locate problems. Now that conferences and individuals are posting high quality material to Youtube two clicks will get you access to amazing gdb resources like this (all three videos are definitely worth watching):
We also have access to step-by-step software debugging guides like the one Brendan Gregg posted to his blog last year. This coming weekend I will be immersing myself in another epic debugging session and I can’t wait to see what I find (and learn). We all need to learn to embrace the unhappy signals that take down our applications. You learn a TON by doing so and make the opensource world better at the same time.
Over the past month I have been rewriting some cron scripts to enhance monitoring and observability. I’ve also been refactoring my ansible playbooks to handle deploying these scripts in a consistent fashion. Ansible ships with the cron module which makes this process a breeze. The cron module has all of the familiar cron attributes (hour, minute, second, program to run, etc.) and takes the following form:
- name: Cron job to prune old elasticsearch indexes
cron:
name: cleanup-elasticsearch-indexes
minute: 0
hour: 0
job: /scripts/curator/curator_clean_logs.sh
state: present
user: curator
When I first played around with this module I noticed that each playbook run would result in a cron entry being added. So instead of getting one curator log cleanup job when the play is executed I would get a one entry per run. This is obviously very bad. When I read back through the cron module documentation I came across this little nugget for the “name” parameter:
Note that if name is not set and state=present, then a new crontab entry will always be created, regardless of existing ones.
Ansible uses the name to tag the entry and if the tag already exists a new cron job won’t be added to the system (in case your interested this is implemented by the find_job() method in cron.py). Small subtleties like this really bring to light the importance of a robust test environment. I am currently using vagrant to solve this problem but there are also a number of solutions documented in the Ansible testing strategies guide.
One of my friends reached out to me earlier this week to ask if there was an easy way to run multiple Linux processes in parallel. There are several ways to approach this problem but most of them don’t take into account hardware cores and threads. My preferred solution for CPU intensive operations is to use the xargs parallel option ("-P”) along with the CPU cores listed in lscpu. This allows me to run one process per core which is ideal for CPU intensive applications. But enough talk, let’s see an example.
Let’s say you need to compress a directory full of log files and want to run one compression job on each CPU core. To locate the number of cores you can combine lscpu and grep:
$ CPU_CORES=$(lscpu -p=CORE,ONLINE | grep -c 'Y')
To generate a list of files we can run find and pass the output of that to xargs:
$ find . -type f -name \.log | xargs -n1 -P{CPU_CORES} bzip2
The xargs command listed above will create one bzip2 process per core and pass it a log file to process. To monitor the pipeline to make sure it is working as intended we can run a simple while loop:
$ while :; do ps auxwww | grep [b]zip; sleep 1; done
matty 14322 0.0 0.0 113968 1228 pts/0 S+ 07:24 0:00 xargs -n1 -P4 bzip2
matty 14323 95.0 0.0 13748 7624 pts/0 R+ 07:24 0:11 bzip2 ./log10.txt.log
matty 14324 95.9 0.0 13748 7616 pts/0 R+ 07:24 0:11 bzip2 ./log2.txt.log
matty 14325 96.0 0.0 13748 7664 pts/0 R+ 07:24 0:11 bzip2 ./log3.txt.log
matty 14326 94.9 0.0 13748 7632 pts/0 R+ 07:24 0:11 bzip2 ./log4.txt.log
There are a number of other useful things you can do with the items listed above but I will leave that to your imagination. Viva la xargs!
During the development of the dns-domain-expiration-checker script I needed a way to test SMTP mail delivery w/o relying on an actual mail exchanger. While reading through the smtplib and snmpd documentation I came across the SMTP debugging server. This nifty module allows you to run a local mail relay which will print the messages it receives to standard out. To enable it you can load the smtpd module and instruct it to run the DebuggingServer command on the IP and port passed as arguments:
$ python -m smtpd -c DebuggingServer -n localhost:8025
This will fire up a local mail server on localhost:8025 and each messaged received will be printed to STDOUT. If I run my DNS domain expiration script and point it to localhost:8025:
$ dns-domain-expiration-checker.py --domainname prefetch.net --email --expiredays 2000 --smtpserver localhost --smtpport 8025 --email
The debugging server prints the SMTP headers and body each time the script generates an e-mail:
$ python -m smtpd -c DebuggingServer -n localhost:8025
---------- MESSAGE FOLLOWS ----------
Content-Type: multipart/mixed; boundary="===============3200155514135298957=="
MIME-Version: 1.0
From: root
To: root
Subject: The DNS Domain prefetch.net is set to expire in 1041 days
X-Peer: 127.0.0.1
--===============3200155514135298957==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Time to renew prefetch.net
--===============3200155514135298957==--
------------ END MESSAGE ------------
Super useful module for troubleshooting SMTP and e-mail communications!
I recently attended a Python workshop with Jeff Cohen and he answered a ton of random questions from students. One student mentioned that he was overwhelmed with the number of Python modules and Jeff told us that he has evolved his Python skills by learning at least one new module each week. I’ve started doing this as well and it’s been a HUGE help. Each Sunday I find a module I’m not familiar with in PiPY or the standard library and read through the documents. Then I do a number of coding exercises to see how the module works. Once I’m comfortable using it I try to read through the source code to see how it’s implemented under the covers. The last part is time consuming but it’s a great way to really understand how the module works.
While perusing the date modules last weekend I came across dateutil. This handy little module provides a built-in parser to take arbitrary dates and normalize them into a datetime object. If you are dealing with different data sources without a common set of formatting standards you will love this little guy! To see how this works say you have two dates and need to get the number of days between them. The following snippet does this.
>>> import dateutil.parser
>>> date1 = "2020-06-23T16:56:05Z"
>>> date2 = "June 22 2018 09:23:45"
>>> d1 = dateutil.parser.parse(date1, ignoretz=True)
>>> d2 = dateutil.parser.parse(date2, ignoretz=True)
>>> print (d1-d2).days
732
If you need higher resolution you can use the min, seconds, total_seconds and microseconds attributes to drill down further. This useful module made rewriting dns-domain-expiration-checker.py a breeze!