Periodically I need to update my systems with new NTP server information. Since I have no idea if the remote servers are well kept, I tend to run ntpdate in test mode prior to starting ntpd or adding ntpdate to root’s crontab. To run ntpdate in test mode, you can use the “-d” option:

$ /usr/sbin/ntpdate -d 192.168.1.1

12 Feb 17:36:36 ntpdate[24068]: ntpdate 4.1.2@1.892 Tue Feb 24 06:32:26 EST 2004 (1)
transmit(192.168.1.1)
receive(192.168.1.1)
transmit(192.168.1.1)
receive(192.168.1.1)
transmit(192.168.1.1)
receive(192.168.1.1)
transmit(192.168.1.1)
receive(192.168.1.1)
transmit(192.168.1.1)
server 192.168.1.1, port 123
stratum 3, precision -18, leap 00, trust 000
refid [192.168.1.1], delay 0.02592, dispersion 0.00000
transmitted 4, in filter 4
reference time:    c97b6a71.b4251202  Mon, Feb 12 2007 17:36:33.703
originate timestamp: c97b6ab2.f2ec3995  Mon, Feb 12 2007 17:37:38.948
transmit timestamp:  c97b6a74.f210f0e9  Mon, Feb 12 2007 17:36:36.945
filter delay:  0.02609  0.02592  0.02594  0.02596
         0.00000  0.00000  0.00000  0.00000
filter offset: 62.00322 62.00318 62.00317 62.00316
         0.000000 0.000000 0.000000 0.000000
delay 0.02592, dispersion 0.00000
offset 62.003182

In addition to printing the timestamps, it also provides the offset the clock will be adjusted by. Certain applications dislike the time moving forward or back, which makes the ntpdate test option even more useful.

Posted by matty, filed under Linux Utilities, Solaris Utilities. Date: February 26, 2007, 9:57 am | No Comments »

I am a big fan of the Sun One Web Server, although I dislike the fact that it provides the server software and version by default in the HTTP header:

$ telnet localhost 80

Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Date: Fri, 23 Feb 2007 22:41:21 GMT
Content-length: 179
Content-type: text/html
Last-modified: Tue, 20 Feb 2007 14:30:21 GMT
Accept-ranges: bytes
Connection: close

Connection closed by foreign host.

This gives out more information that I care to share, and provides remote attackers with an extra piece of information to determine the software stack that is in use. Luckily the value reported in the “Server” attribute can be changed by adding the “ServerString” directive to the magnus.conf. Here is a sample magnus.conf entry that sets the “Server” attribute to the string “Apache”:

ServerString Apache

Once this directive is set, the web server will return the string “Apache” instead of the string “Sun-ONE-Web-Server/6.1″:

$ telnet localhost 80

Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Server: Apache
Date: Fri, 23 Feb 2007 22:43:58 GMT
Content-length: 179
Content-type: text/html
Last-modified: Tue, 20 Feb 2007 14:30:21 GMT
Accept-ranges: bytes
Connection: close

Connection closed by foreign host.

Tis all about not disclosing information if you don’t have to!

Posted by matty, filed under Sun Web Server. Date: February 23, 2007, 6:47 pm | No Comments »

Having now worked with the Sun V40Z for more than a year, I can safely say that it is one of the best server platforms I have ever used. It has incredible lights out management, does a killer job of monitoring the platform environmentals, and can be configured to alert staff to problems it detects. All of these featured are made available through the service processor, which is an out-of-band device dedicated to monitoring and management. Since the service processor is constantly polling the platform environmentals, it knows immediately when a problem arises, and can be configured to send email or an SNMP trap with a detailed explanation of the issue that is detected.

To configure email notifications, you will first need to configure one or more DNS servers so the service processor can resolve the SMTP servers (you can also use IP addresses, but that is a maintenance headache). To configure two DNS servers, the service process “sp” command can be run with the “enable” option, the “dns” keyword and one or more DNS servers:

$ sp enable dns -n 192.168.1.1 -n 192.168.1.2

To view the configured DNS servers, the sp command can be run with the “get dns” option:

$ sp get dns
Name Server(s) Search Domain(s)
192.168.140.7,192.168.140.6

After DNS is configured and verified, the sp utility can be used to set an SMTP server. The following example sets the “From:” line that will be used in all outbound emails, and configures an SMTP server to route mail through:

$ sp set smtp server -f loopy@prefetch.net smtp.prefetch.net

To verify the SMTP settings, the sp utility can be run with the “get smtp server” option:

$ sp get smtp server
Server From Address
smtp.prefetch.net loopy@prefetch.net

Once the SMTP server(s) are configured, you will need to tell the service processor to generate email when an events occurs, and the address to send those events to. To generate email when informational, warning and critical events occur, the sp utility can be run with the “update smtp” option, the event notification level, and an address to send the alert to:

$ sp update smtp subscriber -n SMTP_Crit_Long -r zematty@prefetch.net

$ sp update smtp subscriber -n SMTP_Info_Long -r zematty@prefetch.net

$ sp update smtp subscriber -n SMTP_Warn_Long -r zematty@prefetch.net

Now each time an event occurs, the service processor will send a message with details on the event that occurred. If you want to generate a test event to make sure the SP email event notification facility is configured correctly, the sp utility can be run with the “create test events” options:

$ sp create test events

Tis all about getting notified when ze hardware fails.

Posted by matty, filed under Solaris Misc. Date: February 23, 2007, 6:28 pm | No Comments »

I had a disk drive fail in one of my ZFS pools over the weekend, and needed to swap it out to restore the pool to an optimal state. To begin the swap out, I used the zpool utility to see which disk drive was faulty:

$ zpool status -v

  pool: rz2pool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Feb 13 14:12:37 2007
config:

        NAME          STATE     READ WRITE CKSUM
        rz2pool       DEGRADED     0     0     0
          raidz2      DEGRADED     0     0     0
            c1t9d0    ONLINE       0     0     0
            c1t10d0   ONLINE       0     0     0
            c1t12d0   ONLINE       0     0     0
            c2t1d0    ONLINE       0     0     0
            spare     DEGRADED     0     0     0
              c2t2d0  UNAVAIL      0     0     0  cannot open
              c2t3d0  ONLINE       0     0     0
        spares
          c2t3d0      INUSE     currently in use

Once I located the faulty device, I used cfgadm to add and remove the old and new disk drives from the system, and then ran zpool with the “replace” option to replace the failed drive in my pool:

$ zpool replace rz2pool c2t2d0 c2t2d0

After the replacement operation completed, I used zpool to monitor the resilvering of the replacement drive:

$ zpool status -v

  pool: rz2pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.10% done, 0h31m to go

config:

        NAME                STATE     READ WRITE CKSUM
        rz2pool             DEGRADED     0     0     0
          raidz2            DEGRADED     0     0     0
            c1t9d0          ONLINE       0     0     0
            c1t10d0         ONLINE       0     0     0
            c1t12d0         ONLINE       0     0     0
            c2t1d0          ONLINE       0     0     0
            spare           DEGRADED     0     0     0
              replacing     DEGRADED     0     0     0
                c2t2d0s0/o  UNAVAIL      0     0     0  cannot open
                c2t2d0      ONLINE       0     0     0
              c2t3d0        ONLINE       0     0     0
        spares
          c2t3d0            INUSE     currently in use

errors: No known data errors

All of this was done online, and with minimal interruption to the applications running on the host.

Posted by matty, filed under Solaris ZFS. Date: February 14, 2007, 12:30 am | 1 Comment »

I am in the process of migrating some old Solaris 8 web servers to Solaris 10, and plan to use SMF to stop and start Apache. Solaris ships with a relatively recent release of Apache (if only it included the LDAP authentication module and PHP), which is nicely integrated with SMF. If your a *NIX admin, you gotta love the fact that SMF will restart processes for you:

$ svcadm enable apache2

$ ps -ef | grep http

webservd 16663 16660   0 09:57:59 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16662 16660   0 09:57:59 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16661 16660   0 09:57:59 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16664 16660   0 09:58:00 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16665 16660   0 09:58:00 ?           0:00 /usr/apache2/bin/httpd -k start
    root 16660     1   2 09:57:58 ?           0:00 /usr/apache2/bin/httpd -k start

$ pkill httpd

$ ps -ef | grep http

webservd 16689 16686   0 09:58:08 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16690 16686   0 09:58:08 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16691 16686   0 09:58:08 ?           0:00 /usr/apache2/bin/httpd -k start
    root 16686     1   2 09:58:07 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16688 16686   0 09:58:08 ?           0:00 /usr/apache2/bin/httpd -k start
webservd 16687 16686   0 09:58:08 ?           0:00 /usr/apache2/bin/httpd -k start

Posted by matty, filed under Solaris SMF. Date: February 13, 2007, 11:53 pm | 2 Comments »

While upgrading my desktop this weekend to Fedora Core 6, I received the following error while attempting to start one of my md arrays:

$ /sbin/mdadm -A /dev/md3 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: error opening /dev/md3: No such file or directory

To fix the issue, I had to cd into /dev and add some additional md entries with the MAKEDEV executable:

$ cd /dev && ./MAKEDEV md

Once I ran MAKDEV, mdadm was able to start up the array:

$ /sbin/mdadm -A /dev/md3 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: /dev/md3 has been started with 6 drives.

*** UPDATE ***

Instead of going through the hassle of running MAKDEV, it looks like you can also use the mdadm “-a” option:

-a, –auto{=no,yes,md,mdp,part,p}{NN}
Instruct mdadm to create the device file if needed, possibly allocating an unused minor
number. “md” causes a non-partitionable array to be used. “mdp”, “part” or “p” causes
a partitionable array (2.6 and later) to be used. “yes” requires the named md device
to have a ’standard’ format, and the type and minor number will be determined from
this. See DEVICE NAMES below.

Posted by matty, filed under Linux Storage. Date: February 11, 2007, 3:13 pm | No Comments »

Periodically a nasty bug will rear it’s head with Solaris or the latest build of Nevada, and the operating system will hang for no apparent reason. Recovering from a hang typically requires the administrator to reboot the host, which can delay the time it takes to get the system back to a working state. One nice feature built into Solaris to assist with system hangs is the deadman timer. When enabled, the deadman timer will cause a level 15 interrupt to fire on each CPU every second, which will in turn cause the kernel lbolt variable to be updated. If the deadman timer detects that that lbolt variable hasn’t changed for a period of time (the default is 500 seconds), it will induce a panic, which will cause a core file to be written to /var/crash (or the location you configured with dumpadm). To enable the deadman timer, you can set the “snooping” variable to 1 in /etc/system:

set snooping=1

If you would like the deadman to wait more (or less) than 500 seconds prior to inducing a panic, you can set the “snoop_interval” variable to the desired number of seconds * 100000 (the following example will induce a panic if the lbolt variable hasn’t been updated after 90-seconds):

set snoop_interval=9000000

This is a great feature, and can help isolate nasty bugs that result in system hangs. Since this feature CAN result in a system panic, you should take this into account prior to using it. The author is not liable for misuse. ;)

Posted by matty, filed under Solaris Recovery. Date: February 11, 2007, 10:16 am | No Comments »

09  Feb
Not grep!

While reviewing some shell scripts last week, I saw the infamous find | grep:

$ /usr/bin/find /foo -type f | egrep -v \*.inp

I am not real sure why more people don’t leverage the logic operations build into find:

$ /usr/bin/find /foo -type f -not -name \*.inp

This saves a fork() and exec(), and should be a bit faster. I am curious if folks use grep because it’s easier to read, or because they don’t know about the logic operations built into find. I shall need to investigate …

Posted by matty, filed under UNIX Shell. Date: February 9, 2007, 10:11 pm | 5 Comments »

« Previous Entries