Using Net-SNMP to monitor processes and execute Perl scripts by hitting a MIB

One really cool feature about net-snmp is that it can monitor processes on a system. If snmpd notices that a process is no longer running, you can specify a script to be executed which can correct the problem. Lets take a look at some examples found here

# At least one web server process must be running at all times

proc httpd

procfix httpd /etc/rc.d/init.d/httpd restart

# There should never be more than 10 mail processes running

# (more implies a probable mail storm, so shut down the mail system)

proc sendmail 10

procfix sendmail /etc/rc.d/init.d/sendmail stop

# There should be a single network management agent running

# (“There can be only one”)

proc snmpd 1 1

This can be an interesting use of Puppet, Chef, or CFengine, which are configuration management engines. Depending upon the type of host (webserver, mail server, etc.) you could set up and establish different types of snmpd.conf files for that environment. Granted, things like Solaris’ 10 SMF can automatically restart services, but this is a cool feature built into net-snmpd if you weren’t already aware that it was there! Secton 8 of the net-snmp README.Solaris file also has some other examples if you’re interested.

One other really cool feature about net-snmp is the ability to remotely execute Perl scripts by simply hitting a MIB. One example could be something like a “clean up disk space” script. Here’s an outtake from the README.Solaris file showing an example of the Perl script execution in action:

Net-SNMP may be compiled with Perl support by configuring like:

./configure -enable-embedded-perl …

Once you have compiled and installed net-snmp you can test the Perl capabilities of the final installation by doing the following:

Copy the perl_module.pl script found at

http://www.net-snmp.org/tutorial-5/toolkit/perl/index.html

to /usr/local/net-snmp and modify your /usr/local/share/snmp/snmpd.conf file to contain the entry:

perl do “/usr/local/net-snmp/perl_module.pl”;

then do:

/usr/local/bin/snmpwalk -v 2c -c whatever localhost .1.3.6.1.4.1.8072.999

It should return the following:

NET-SNMP-MIB::netSnmp.999.1.2.1 = STRING: “hello world”

Net-snmp should now work in an opensolaris non-global zone

While debugging a net-snmp issue a while back, I came across the following error:

error on subcontainer ‘interface container’ insert (-1)

These errors are caused by opensolaris bug #6640675, which causes all interfaces to be assigned an index value of 0 (this leads net-snmp to think there are duplicate interfaces). The fix was just integrated into Nevada, so hopefully the code will be back ported to Solaris 10.

Net-snmp returns zeros for various UDP and TCP mibII data

While testing out the latest net-snmp bits, I noticed that various TCP and UDP mibII OIDs would continuosly display zeros on my Solaris 10 update 6 host:

$ snmpwalk -c … -v 2c localhost udpInDatagrams.0
UDP-MIB::udpInDatagrams.0 = Counter32: 0

# Generate lots of UDP traffic

$ snmpwalk -c … -v 2c localhost udpInDatagrams.0
UDP-MIB::udpInDatagrams.0 = Counter32: 0

This turns out to be caused by a net-snmp bug, which is fixed by applying the patch attached to the bug report. I am posting this here to help others who may bump into this issue.

Debugging net-snmp problems

I spent a fair amount of time debugging a bizarre net-snmp issue yesterday, and learned a TON about how net-snmp is implemented. While reading through the net-snmp code, I came across a number of macros similar to the following:

DEBUGMSGTL((“mibII/udpScalar”, “Initialising UDP scalar group\n”));

The first argument to the DEBUGMSGCTL() macros contains a token name, which can be passed to the snmpd daemon’s “-D” option to get verbose debugging data:

$ snmpd -c /etc/snmpd.conf -p /var/run/snmpd.pid -Lo -q -V -f -D kernel_sunos5,mibII/udpScalar

Received SNMP packet(s) from UDP: [0.0.0.0]->[1.2.3.4.102]:-14599
  GETNEXT message
    -- UDP-MIB::udpInDatagrams
kernel_sunos5: getMibstat (10, *, 40, 0, *, *)
kernel_sunos5: ... cache_valid 0 time 0 ttl 30 now 1239840358
kernel_sunos5: ...... getmib (263, 0, ...)
kernel_sunos5:dlpi: calling getentry aac: req_type: 0, buf: 24, entrysize: 40
kernel_sunos5: bad cache length 24 - not multiple of entry size 40
kernel_sunos5: ...... getmib returns 2
kernel_sunos5: ... result 1 rc 2
kernel_sunos5: ... getMibstat returns 1
mibII/udpScalar: getMibstat call from udp_load : MIB_UDP 10, sizeof(mib2_udp_t): 40
mibII/udpScalar: Loaded UDP scalar Group (solaris)
mibII/udpScalar: Handler - mode GETmibII/udpScalar: oid: UDP-MIB::udpInDatagrams.0Received SNMP packet(s) from UDP: [0.0.0.0]->[1.2.3.4]:-14599



To find the tokens that are available, you can bust out the trusty find utility:

$ cd net-snmp-5.4.2.1

$ find . -type f | xargs grep DEBUGMSG |more

./agent/agent_handler.c:        DEBUGMSGTL(("handler::register", "Registering %s (", reginfo->handlerName));
./agent/agent_handler.c:        DEBUGMSG(("handler::register", "::%s", handler->handler_name));
./agent/agent_handler.c:        DEBUGMSG(("handler::register", ") at "));
......



While the debugging output is a bit primitive, it is extremely useful when you can compare it side-by-side with the net-snmp source code. This helped me locate and fix an annoying bug (data is incorrect on Solaris 10 u4+ hosts), which allowed me to roll out the new version of the code to various hosts (the new version fixes a couple of bugs that lead to the daemon hanging after a period of time). Debugging is a bunch of fun, and there is nothing better than finding a solution to an issue!