Using systemd to restart processes that crash


The gmetad process on my Ganglia server has been a bit finicky lately. Periodically it segfaults which prevents new metrics from making their way into the RRD databases it manages:

[14745149.528104] gmetad[24286]: segfault at 0 ip 00007fb498c413c1 sp 00007fb48db40358 error 4 in libc-2.17.so[7fb498ade000+1b7000]

Luckily The gmetad service runs under systemd which provides a Restart directive to revive failed processes. You can take advantage of this nifty feature by adding “Restart=always” to yourunit files:

$ cat /usr/lib/systemd/system/gmetad.service

[Unit]
Description=Ganglia Meta Daemon
After=network.target

[Service]
Restart=always
ExecStart=/usr/sbin/gmetad -d 1

[Install]
WantedBy=multi-user.target

Now each time gmetad pukes systemd will automatically restart it. Hopefully I will get some time in the next few weeks to go through the core file to see why it keeps puking. Until then, this band aid should work rather nicely.

This article was posted by Matty on 2016-10-12 09:50:00 -0400 EDT