Monitoring DNS servers

I recently started supporting several DNS servers running BIND 9. To ensure that these server are up and operational at all times, I wrote a small shell script named dns-check to test the operational state of each server. The script takes a file as an argument, and each line in the file contains the IP address of a DNS server (names will also work), a name to resolve, and the record type that should be requested. If the script is unable to resolve the name for one reason or another (any return code > 0 is a failure), the script will log a message to syslog, and send E-mail to the address listed in the $ADMIN variable, or an address passed to the “-e” option. Here is sample run:

$ cat dns-check-sites
ns1.fooby.net mail.fooby.net A
ns2.fooby.net mail.fooby.net A

$ dns-check -e dns-admin@prefetch.net -f dns-check-sites

The script is nothing special, but might be useful to folks running DNS servers.

Expanding Solaris metadevices

I recently had a file system on a Solaris Volume Manager (SVM) metadevice fill up, and I needed to expand it to make room for some additional data. Since the expansion could potentially cause problems, I backed up the file system, and saved a copy of the metastat and df output to my local workstation. Having several backups always gives me a warm fuzzy, since I know I have a way to revert back to the old configuration if something goes awry. Once the configuration was in a safe place and the data backed up, I used the umount command to unmount the /data file system, which lives on metadevice d100:

$ df -h

Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1t0d0s0      7.9G   2.1G   5.7G    27%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   2.3G   600K   2.3G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
/usr/lib/libc/libc_hwcap1.so.1
                       7.9G   2.1G   5.7G    27%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
/dev/dsk/c1t0d0s4      4.0G   154M   3.8G     4%    /var
swap                   2.3G    32K   2.3G     1%    /tmp
swap                   2.3G    24K   2.3G     1%    /var/run
/dev/dsk/c1t0d0s3       19G   2.8G    17G    15%    /opt
/dev/md/dsk/d100        35G    35G   120M    99%    /data

$ umount /data

After the file system was unmounted, I had to run the metaclear utility to remove the metadevice from the meta state database:

$ metaclear D100
d100: Concat/Stripe is cleared

Now that the metadevice was removed, I needed to add it back with the desired layout. It is EXTREMELY important to place the device(s) back in the right order, and to ensure that the new layout doesn’t corrupt the data that exists on the device(s) that contain the file system (i.e., don’t create a RAID5 metadevice with the existing devices, since that will wipe your data when the RAID5 metadevice is initialized). In my case, I wanted to concatenate another hardware RAID protected LUN to the meta device d100. This was accomplished by running metainit with the “numstripes” equal to 2 to indicate a 2 stripe concatenation, and “width” equal to 1 to indicate that each stripe should have one member:

$ metainit d100 2 1 c1t1d0s0 1 c1t2d0s0
d100: Concat/Stripe is setup

Once the new metadevice was created, I ran the mount utility to remount the /data file system, and then executed growfs to expand the file system:

$ mount /dev/md/dsk/d100 /data

$ growfs -M /data /dev/md/rdsk/d100

Warning: 2778 sector(s) in last cylinder unallocated
/dev/md/rdsk/d100:      150721830 sectors in 24532 cylinders of 48 tracks, 128 sectors
        73594.6MB in 1534 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
..............................
super-block backups for last 10 cylinder groups at:
 149821984, 149920416, 150018848, 150117280, 150215712, 150314144, 150412576,
 150511008, 150609440, 150707872

After the growfs operation completed, I had some breathing room on the /data file system:

$ df -h

Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1t0d0s0      7.9G   2.1G   5.7G    27%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   2.3G   600K   2.3G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
/usr/lib/libc/libc_hwcap1.so.1
                       7.9G   2.1G   5.7G    27%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
/dev/dsk/c1t0d0s4      4.0G   154M   3.8G     4%    /var
swap                   2.3G    32K   2.3G     1%    /tmp
swap                   2.3G    24K   2.3G     1%    /var/run
/dev/dsk/c1t0d0s3       19G   2.8G    17G    15%    /opt
/dev/md/dsk/d100        71G    36G    35G    49%    /data

The fact that you have to unmount the file system to grow a metadevice is somewhat frustrating, since every other LVM package I have used allows volumes and file system to be expanded on the fly (it’s a good thing ZFS is shipping with Solaris). As with all data migrations, you should test storage expansion operations prior to performing them on production systems.

Using the ultimate boot disk to test hardware

I have been using the Ultime Boot Disk for the past few months to test x86 and X64 hardware. The disk contains numerous awesome utilities that can be used to test memory, disks and CPUs. The following packages come on the CD, and are four of my personal favorites:

Memtest86+ to test memory

Darik’s Boot and Nuke to securely erase data from a disk drive

CPU burn to test CPUs

PCI sniffer to identify the type of card in a system

Checking the integrity of Solaris binaries

One new feature in Solaris 10 that doesn’t get much press is the basic auditing and reporting tool (bart). Bart allows you to generate integrity checks for one or more files on a server. This allows you to compare two groups of file integrity checks (groups of file integrity checks are referred to as manifests in the bart documentation) to see what changed on a server. Bart is super easy to use, and comes with just two options, “create” and “compare.” The “create” option can be used to create a new manifest, and the “compare” option can be used to compare the contents of two manifests. The following example show how to use the “create” option to generate a file integrity check of every file that resides in a global zone’s* root file system:

$ bart create -R / > bart.manifest.08-14-2006.1

$ bart create -R / > bart.manifest.08-14-2006.2

One two manifests are created, the bart “compare” option can be run to compare the manifests:

$ bart compare bart.manifest.08-14-2006.1 bart.manifest.08-14-2006.2

/var/adm/messages:
  size  control:8866  test:8957
  mtime  control:44e100a3  test:44e1019e
  contents  control:b349f015631c87065842009d87a1a456    
  test:be07b4863f18165fcd154b9f0fce2a64

/var/cron/log:
  size  control:76152  test:76396
  mtime  control:44e10070  test:44e1019d
  contents  control:7cd2f996f0cec248cd5eae4f3e6cce7e  
  test: 29bf6ecbd171ebe1879e641d5b5739f2

/var/log/pool/poold:
  size  control:651159  test:652111
  mtime  control:44e10160  test:44e10232
  contents  control:9339cb8fac19bb9231e35866cd1a2942  test:89880fbd73332cfc770454fdd034cba1

/var/svc/log/network-ssh:default.log:
  size  control:226076  test:226181
  mtime  control:44e10070  test:44e1019d
  contents  control:5a856f39ede7c7528f9405f573eedd5b  
  test:778ebe08677923862b03aec5d41e3c51

As you can see from the output above, several logfiles changed between two consecutive runs. While not a complete file integrity solution, bart is a super useful utility, and should be used after each system installation and patch application.

* The bart manual page states that you shouldn’t run bart on the root file system in a non-global zone.

Locking down the OS X firewall

I attended Jay Beale’s Discovering OS X weaknesses and fixing them with the new Bastille Linux port at Defcon last week. Jay did a great job presenting, and pointed out several HUGE flaws that are present with the default OS X “stealth” firewall rule set. The first major problem Jay pointed out was the fact that all UDP datagrams with source port 67 or 5353 are allowed in (this allows you to talk to ntpd and cups, which have a rocky security history). The second major flaw is the fact that the default configuration blocks ICMP type code 8 (ICMP echo requests), but allows all other ICMP traffic in. And finally, OS X defaults to an allow any rule, which allows cruft like bonjour and the service locator to pollute your network with the version of OS X you are running, and the hardware architecture you are running on (this is a shell coders dream!). I take security rather seriously, so I sat down the night I got home and read the ipfw manual page, and created the following firewall rule set to deny all traffic by default, and allow a few trusted services out:

$ cat /etc/rc.firewall

#!/bin/sh

# Variables to simplify maintenance (these are comma delimited)
DNS_SERVERS=”10.1.1.1″

# Enable firewall logging
/usr/sbin/sysctl -w net.inet.ip.fw.verbose=1

# Flush existing rules
/sbin/ipfw -f flush

# If the rule was added to the dynamic rule table, let it in
/sbin/ipfw add check-state

# Allow traffic to flow on the loopback interface
/sbin/ipfw add allow all from any to any via lo0

# Allow established connections
/sbin/ipfw add allow tcp from any to any established

# Allow SSH connections
/sbin/ipfw add allow tcp from me to any 22 keep-state

# Allow non-secure web traffic
/sbin/ipfw add allow tcp from me to any 80 keep-state

# Allow secure web traffic
/sbin/ipfw add allow tcp from me to any 443 keep-state

# Allow secure LDAP traffic
/sbin/ipfw add allow tcp from me to any 636 keep-state

# Allow IMAPS
/sbin/ipfw add allow tcp from me to any 993 keep-state

# Allow me to get to my DNS servers
/sbin/ipfw add allow udp from me to ${DNS_SERVERS} 53 keep-state

# Optionally allow ICMP traffic out
# /sbin/ipfw add allow icmp from me to any out keep-state

# Deny everything else
ipfw add deny log ip from any to any

To enable the policy at startup, you need to place the rules listed above in a file, and make the file executable. This blog entry assumes the rules were placed in the file /etc/rc.firewall. Next, you will need to create an entry in the system startup folder. Each startup item contains a script to start and stop the service, and a property file to control when and how the service starts. To enable the firewall policy listed above, we can create a file called /Library/StartupItems/Firewall/Firewall with start, stop and restart actions:

$ cat /Library/StartupItems/Firewall/Firewall

#!/bin/sh

##
# Firewall
##

. /etc/rc.common

case "$1" in
  start)

        ConsoleMessage "Starting Firewall"

        # Activate the firewall rules
        /etc/rc.firewall > /dev/null
        ;;
  stop)
        echo "Stopping Firewall..."
        /sbin/ipfw -f flush
        ;;
  restart)

        ConsoleMessage "Retarting Firewall"

        # Activate the firewall rules
        /etc/rc.firewall > /dev/null
        ;;
esac

exit 0

In addition to the script listed above, you will also need to create a properties file to tell OS X when the service should start, and any dependencies that need to be online before the service is started. The properties file should be placed in the same directory as the startup script, and named StartupParameters.plist. The following property file can be used along with the Firewall startup script listed above:

$ cat /Library/StartupItems/Firewall/StartupParameters.plist

{
  Description     = "Firewall";
  Provides        = ("Firewall");
  Requires        = ("NetworkExtensions","Resolver");
  OrderPreference = "Late";
  Messages =
  {
    start = "Starting firewall";
    stop  = "Stopping firewall";
  };
}

Once all three files are in place, you can reboot the machine, and run ‘ipfw show’ as the root user to make sure the policy is installed. Daniel Cote has a great write up on building robust OS X firewall (ipfw) rulesets (I didn’t need some of the bells and whistles provided by Daniel’s firewall.sh.current script, so I reduced the rules to exactly what I need to filter inbound and outbound traffic). The Firewall and StartupParameters.plist files were taken from the firewall tarball on Daniel’s website, and I would like to thank him for putting together such an awesome website!

Microsoft Word is broadcasting on my network!

While performing some basic traffic analysis on my home wireless network, I noticed the folllowing broadcast traffic:

$ tcpdump -i en1 broadcast or multicast
15:51:25.761928 IP (tos 0x0, ttl 64, id 28912, offset 0, flags [none], proto: UDP (17), length: 180) 192.168.1.10.52330 > 255.255.255.255.2222: UDP, length 152
15:52:25.765492 IP (tos 0x0, ttl 64, id 28951, offset 0, flags [none], proto: UDP (17), length: 180) 192.168.1.10.52331 > 255.255.255.255.2222: UDP, length 152
15:53:25.769116 IP (tos 0x0, ttl 64, id 28989, offset 0, flags [none], proto: UDP (17), length: 180) 192.168.1.10.52332 > 255.255.255.255.2222: UDP, length 152

Gak! I disabled rendezous on my laptop to avoid polluting the ether, and the applications that were running shouldn’t be broadcasting messages! I was curious to see what was causing this, so I went into discovery mode. After reviewing ktrace, netstat and lsof data, I realized that the traffic was coming from Microsoft Word. It turns out that Microsoft Word sends broadcast messages to ensure that a license is only being used on a single node. This is *supposed* to help combat piracy, but I didn’t agree to this when I signed the EULA. This was extremely annoying, and what made it worse is the fact that Microsoft Word also listens on a TCP port:

$ lsof -i | grep Microsoft

Microsoft 1208 matty   19u  IPv4 0x0283a590      0t0  TCP *:3797 (LISTEN)

Last week Microsoft released several critical Office patches for the Windows paltform (I am not sure if these apply to OS X yet, so I don’t want Microsoft office application blindly sending or accepting data). My laptop now uses a selectively allow and deny everything else firewall policy, which stops this cruft from meandering throughout my home network. If you don’t feel like mucking with the default firewall policy, you can add an ipfw rule similar to the following to block this traffic:

$ /sbin/ipfw add deny udp from any to any 2222 out

I reckon it’s time to switch to Pages for word processing.