Adding status support to the Solaris dd utility

I really dig Solaris, but it is missing a few basic features I have come to rely on. One of these features is the ability to send the dd utility a signal to get the status of a copy operation. Since Solaris is now opensource (or mostly opensource), I thought I would hack some code together to implement this feature. After a bit of coding (and testing), I requested a sponsor to putback my change into opensolaris. One was assigned, and I sent my changes to him to review. I haven’t heard back from him in the past month or two, so I reckon he’s too busy to help me putback my changes. In case others are interested in this feature, I placed a diff with my changes on my website.

Configuring the V40z SP to email home when problems occur

Having now worked with the Sun V40Z for more than a year, I can safely say that it is one of the best server platforms I have ever used. It has incredible lights out management, does a killer job of monitoring the platform environmentals, and can be configured to alert staff to problems it detects. All of these featured are made available through the service processor, which is an out-of-band device dedicated to monitoring and management. Since the service processor is constantly polling the platform environmentals, it knows immediately when a problem arises, and can be configured to send email or an SNMP trap with a detailed explanation of the issue that is detected.

To configure email notifications, you will first need to configure one or more DNS servers so the service processor can resolve the SMTP servers (you can also use IP addresses, but that is a maintenance headache). To configure two DNS servers, the service process “sp” command can be run with the “enable” option, the “dns” keyword and one or more DNS servers:

$ sp enable dns -n 192.168.1.1 -n 192.168.1.2

To view the configured DNS servers, the sp command can be run with the “get dns” option:

$ sp get dns
Name Server(s) Search Domain(s)
192.168.140.7,192.168.140.6

After DNS is configured and verified, the sp utility can be used to set an SMTP server. The following example sets the “From:” line that will be used in all outbound emails, and configures an SMTP server to route mail through:

$ sp set smtp server -f loopy@prefetch.net smtp.prefetch.net

To verify the SMTP settings, the sp utility can be run with the “get smtp server” option:

$ sp get smtp server
Server From Address
smtp.prefetch.net loopy@prefetch.net

Once the SMTP server(s) are configured, you will need to tell the service processor to generate email when an events occurs, and the address to send those events to. To generate email when informational, warning and critical events occur, the sp utility can be run with the “update smtp” option, the event notification level, and an address to send the alert to:

$ sp update smtp subscriber -n SMTP_Crit_Long -r zematty@prefetch.net

$ sp update smtp subscriber -n SMTP_Info_Long -r zematty@prefetch.net

$ sp update smtp subscriber -n SMTP_Warn_Long -r zematty@prefetch.net

Now each time an event occurs, the service processor will send a message with details on the event that occurred. If you want to generate a test event to make sure the SP email event notification facility is configured correctly, the sp utility can be run with the “create test events” options:

$ sp create test events

Tis all about getting notified when ze hardware fails.

Solaris device in use checking

One nifty feature that recently made it’s appearance in Solaris 10 is device in use checking. This feature is implemented by the libdiskmgt.so.1 shared library, and allows utiltiies to see if a device is being used, and what it is being used for. This is really neat, and I love the fact that format now prints what each partition on an active device is being used for:

$ format

Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /pci@0,0/pci-ide@11,1/ide@0/cmdk@0,0
       1. c1d0 
          /pci@0,0/pci-ide@11,1/ide@1/cmdk@0,0
Specify disk (enter its number): 0
selecting c0d0
Controller working list found
[disk formatted, defect list found]
/dev/dsk/c0d0s0 is part of SVM volume stripe:d10. Please see metaclear(1M).
/dev/dsk/c0d0s1 is part of SVM volume stripe:d30. Please see metaclear(1M).
/dev/dsk/c0d0s3 is part of SVM volume stripe:d20. Please see metaclear(1M).
/dev/dsk/c0d0s4 is part of active ZFS pool home. Please see zpool(1M).
/dev/dsk/c0d0s7 contains an SVM mdb. Please see metadb(1M).

I digs me some Solaris!

How to enable DMA mode in X64 Solaris

While perusing the system logfiles on a server running X64 Solaris 10 6/06 this weekend, I noticed several errors similar to the following:

$ tail -10 /var/adm/messages
Jul 16 15:45:49 neutron genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with “atapi-cd-dma-enabled” property
Jul 16 15:45:49 neutron genunix: [ID 882269 kern.info] PIO mode 4 selected
Jul 16 15:45:49 neutron genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with “atapi-cd-dma-enabled” property
Jul 16 15:45:49 neutron genunix: [ID 882269 kern.info] PIO mode 4 selected
Jul 16 15:45:49 neutron genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with “atapi-cd-dma-enabled” property
Jul 16 15:45:49 neutron genunix: [ID 882269 kern.info] PIO mode 4 selected
Jul 16 15:45:49 neutron genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with “atapi-cd-dma-enabled” property
Jul 16 15:45:49 neutron genunix: [ID 882269 kern.info] PIO mode 4 selected
Jul 16 15:45:49 neutron genunix: [ID 773945 kern.info] UltraDMA mode 6 selected

The syslog entry mentioned that the “atapi-cd-dma-enabled” eeprom property can be used to enable DMA on the CD drive, so I ran the eeprom utility with the “atapi-cd-dma-enabled=1” option to enable DMA:

$ eeprom “atapi-cd-dma-enabled=1”

To verify that the setting took effect, I ran the eeprom utility:

$ eeprom

kbd-type=US-English
ata-dma-enabled=1
atapi-cd-dma-enabled=1
ttyb-rts-dtr-off=false
ttyb-ignore-cd=true
ttya-rts-dtr-off=false
ttya-ignore-cd=true
ttyb-mode=9600,8,n,1,-
ttya-mode=9600,8,n,1,-
lba-access-ok=1
prealloc-chunk-size=0x2000
bootpath=/pci@0,0/pci-ide@11,1/ide@0/cmdk@0,0:a
console=text

This stopped the errors from appearing in my system logfiles, and the CD drive is performing awesome.

Reasons why people are switching from Solaris to Linux

I met up this week with one of my friends that I haven’t seen in a while. We chatted about life, work and eventually started chatting about Linux and Solaris (we are both SysAdmins). My friend mentioned that his company had decided to quit buying Sun hardware in favor of Dell servers running Redhat Linux Advanced Server. I was shocked to hear this since my friend had actively pushed Solaris in the past, and was one of the folks I regularly got together with to discuss new technologies merged into Nevada. His company has numerous concerns surrounding Solaris 10 manageability and Sun’s lack of ACTIVE support for commonly used opensource packages. We chatted about this for hours over cocktails, and both came to the conclusion that Sun needs to do something to address the following problems with Solaris:

1. Solaris doesn’t ship with a working and supported LAMP stack (I should probably say SAMP stack). My friend’s company is frustrated with having to manually download and build Apache, MySQL, and PHP on their Solaris boxes, and chose to move to Redhat Advanced server to get a working and SUPPORTED LAMP solution out of the box. I am not sure why Sun can’t ship a working and supported SAMP stack with Solaris. This seems like a no brainer to me.

2. Several of the developers at my friends company have transitioned to Fedora Core on their desktops, since the desktop looks pleasant, wireless works out of the box for most chipsets, eclipse is an installation option, and there is a full suite of applications available after the installation. The Fedora Core desktop is quite a bit more usable that JDS (if you don’t believe me, install Fedora Core 5 side-by-side with JDS), so developers have jumped all over it (at least those that don’t use Windows). Sun really needs to do something to improve desktop usability, and they should use the GNOME release from gnome.org versus their own variant. They also need to do something to address package management, either by adopting blastwave or developing a decent remotely-accessible package repository.

3. Redhat Linux ships and provides regular updates for numerous opensource software (e.g., postgres, MySQL, Apache, Samba, Bind, Sendmail, openssh, openssl, etc), where Sun keeps trying to sell customers the Sun Java One stack, “modifies” an opensource package and diverges the product from what is available everywhere else, and fails to provide timely bug fixes and security patches for the opensource packages that are shipped (Apache, MySQL and Samba are perfect examples) with Solaris. Sun really needs to get some folks focused on supporting the opensource solutions people use, versus shipping opensource software and letting the bits rot.

4. Several key ISVs are pushing Linux and Windows over Solaris, and have switched from Solaris to Linux as their tier I development platform. This typically means that developers will squash more platform-specific bugs in their product prior to shipping it, since they are using that platform daily. Sun needs to do more to get developers writing code on Solaris, since this helps Sun customers in the end.

5. Managing applications and patches on Solaris systems is a disaster, and redhat’s up2date utility is not only efficient, but has numerous options to control the patch notification and update process. This can also be used along with Redhat’s satellite server to provide Enterprise wide patch and application management. While Sun kicked off an effort to address the patch and installation process, I wonder if it will be too little too late.

6. Staying on the cutting edge with Nevada is difficult, since there is currently no way to easily and automatically upgrade from one release of Nevada to another. On Fedora Core servers, you can run ‘yum upgrade’ to get the latest bits. Having to download archives and BFU is tedious, and most admins don’t want to spend their few spare cycles BFU’ing to new releases.

7. Zones are unusable at my friend’s site, since there is currently no way to filter traffic between zones, apply QOS measures to memory, I/O and network resources, and patching a box with zones can take days in some cases (I have experienced this first hand. If you want to see, install Solaris 10, create 25 non-sparse zones, and run smpatch update). Addressing these items would allow SysAdmins to actually patch their systems, and would allow folks to sleep at night knowing that the QOS measures will protect rogue applications from taking down their servers.

8. The Solaris opensource movement was great, but in our opinions it is very much closed to the outside world. How many people outside of Sun have actually done ARC reviews, code reviews, or applied a putback to the kernel source tree (there may be cases, but I can’t find them on opensolaris.org)? This is definitely not something that can happen overnight, but people who have to wait in a queue for a sponsor, or worse yet are ignored (I filed a bug 3-months ago and asked to work on it, and have yet to hear back from Sun) when they try to fix something, will cause people to join communities where their voice actually matters.

That said, Solaris 10 is an awesome Operating System, and comes with some incredible technologies (e.g., ZFS, DTrace, FMA, etc). I truly do hope that Sun takes some steps to address these issues, since it will hopefully lead to further adoption of Solaris.

Mirroring boot devices with Solaris jumpstart

A few years back I developed a script to mirror the primary boot device as part of the jumpstart process. I was amazed that jumpstart didn’t support automated mirroring, and was hopeful that Sun would eventually provide a solution to address this. Someone at Sun was obviously paying attention to the numerous RFEs that were filed, and as of Solaris 9 you can mirror your boot device by adding “mirror” and “metadb” statements to a client profile:

$ egrep ‘(filesys|metadb)’ profile

filesys         mirror:d0    c0t0d0s0 c0t1d0s0  free  /     logging
filesys         mirror:d10   c0t0d0s1 c0t1d0s1  1024 swap
metadb          c0t0d0s7     size 8192 count 3
metadb          c0t1d0s7     size 8192 count 3

I haven’t found a way to use references to disk0 and disk1, but will report back when I do. This is cool!