Using systemd to restart processes that crash

The gmetad process on my Ganglia server has been a bit finicky lately. Periodically it segfaults which prevents new metrics from making their way into the RRD databases it manages:

[14745149.528104] gmetad[24286]: segfault at 0 ip 00007fb498c413c1 sp 00007fb48db40358 error 4 in[7fb498ade000+1b7000]

Luckily The gmetad service runs under systemd which provides a Restart directive to revive failed processes. You can take advantage of this nifty feature by adding “Restart=always” to your unit files:

$ cat /usr/lib/systemd/system/gmetad.service
Description=Ganglia Meta Daemon

ExecStart=/usr/sbin/gmetad -d 1


Now each time gmetad pukes systemd will automatically restart it. Hopefully I will get some time in the next few weeks to go through the core file to see why it keeps puking. Until then, this band aid should work rather nicely.