Conditionally restarting systemd services

In a previous post I discussed how one of my systemd services was getting continuously restarted causing the CPU to spike. This isn’t ideal and after re-reading the systemd manual page I came across a couple of useful options to control when and how frequently a systemd service will restart. The first option is RestartSec which controls how long systemd will wait to restart a process after a failure occurs. Systemd also has the RestartForceExitStatus and RestartPreventExitStatus which allow you to define the signals that should or should not cause a restart. To ensure that logstash isn’t continuously restarted when a configuration issue is present I set RestartPreventExitStatus to 1. Abnormal signals like SIGSEGV, SIGABRT, etc. will trigger a restart but a return code of 1 will not.

Forwarding the systemd journal to syslog

I am a big fan of the ELK stack and use it daily for various parts of my job. All of the logs from my physical systems, VMs and containers get funneled into elasticsearch, indexed and are available for me to slice and dice with Kibana. In addition to syslog data I also like to funnel the systemd journal into elasticsearch. This is easily accomplished by changing the journald.conf ForwardToSyslog configuration directive to yes:

$ sed -i.bak ‘s/#ForwardToSyslog/ForwardToSyslog\=yes/’ /etc/systemd/journald.conf

This small change will cause all journal entries to get routed to the local syslog daemon. Once they are there you can set up your favorite log shipping solution to get them into your elasticsearch cluster.

Using systemd to restart processes that crash

The gmetad process on my Ganglia server has been a bit finicky lately. Periodically it segfaults which prevents new metrics from making their way into the RRD databases it manages:

[14745149.528104] gmetad[24286]: segfault at 0 ip 00007fb498c413c1 sp 00007fb48db40358 error 4 in libc-2.17.so[7fb498ade000+1b7000]

Luckily The gmetad service runs under systemd which provides a Restart directive to revive failed processes. You can take advantage of this nifty feature by adding “Restart=always” to your unit files:

$ cat /usr/lib/systemd/system/gmetad.service
[Unit]
Description=Ganglia Meta Daemon
After=network.target

[Service]
Restart=always
ExecStart=/usr/sbin/gmetad -d 1

[Install]
WantedBy=multi-user.target

Now each time gmetad pukes systemd will automatically restart it. Hopefully I will get some time in the next few weeks to go through the core file to see why it keeps puking. Until then, this band aid should work rather nicely.