Fixing a broken Solaris zone


I applied the latest set of patches to my x86 Solaris 10 server this morning, and after the server was rebooted I noticed that my zones didn’t start. When I ran the zoneadm utility with the “list” option, all of the zones were in the “installed” state (they should be in the running state since the autoboot variables was set to true):

$ zoneadm list -vc

ID NAME STATUS PATH
0 global running /
- z1-t installed /zones/z1-t
- z2-d installed /zones/z2-d
- z3-p installed /zones/z3-p

At first I thought the zones service might be in the maintenance state, but after reviewing the output from the svcs command, that theory turned out to be incorrect:

$ svcs -a | grep zones

online 8:39:22 svc:/system/zones:default

Since the box contained several critical services, I decided to start the zones by hand and perform mostmortem analysis after the zones were back up and operational. When I ran zoneadm with the the “boot” option and the name of the zone to boot, I was greeted with the following error:

$ zoneadm -z dns boot

zoneadm: zone 'dns': Failed to initialize privileges: No such file or directory
zoneadm: zone 'dns': call to zoneadmd failed

Oh good grief! After reviewing my notes, I noticed that I had applied patch 122663-06 (a libezonecfg patch) as part of the patch bundle. Configurable zone privileges are coming as part of Solaris 10 update 3, and it looks like they prematurely made their way into a Solaris 10 patch. Since I have not had a chance to play with configurable privileges, I decided to create a new zone to see if zonecfg worked ok, and also to see if configurable privileges required additional attributes:

$ zonecfg -z test

test: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:test> create
zonecfg:test> info
zonepath:
autoboot: false
pool:
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
zonecfg:test> set zonepath=/zones/test
zonecfg:test> commit
ld.so.1: zonecfg: fatal: relocation error: file /usr/sbin/zonecfg: symbol zonecfg_add_index: referenced symbol not found
Killed

Well that isn’t good, and the output from the info command doesn’t seem to indicate that new attributes were added. I needed to get the box up and running, so I decided to try to back out patch 122663-06. When I ran patchrm to remove the patch, it bombed out since it wasn’t able to start the zones:

$ patchrm 122663-06

Validating patches...

Loading patches installed on the system...

Done!

Checking patches that you specified for removal.

Done!

Approved patches will be removed in this order:

122663-06
Preparing checklist for non-global zone check...

Checking non-global zones...

Booting non-global zone dns for patch check...
ERROR: unable to boot zone: problem running on zone : error 1
zoneadm: zone 'dns': Failed to initialize privileges: No such file or directory
zoneadm: zone 'dns': call to zoneadmd failed

Can not boot non-global zone dns

Gak! Once I realized that backing out the patch with patchrm wouldn’t work, I decided to back up the libzonecfg shared library that 122663-06 had installed, and copy the previous version over it. To find the previous version, I used the find command in the /var/sadm directory:

$ cd /var/sadm

$ find . -name 122663-06

./pkg/SUNWcsr/save/pspool/SUNWcsr/save/122663-06
./pkg/SUNWcsr/save/122663-06
./pkg/SUNWzoneu/save/pspool/SUNWzoneu/save/122663-06
./pkg/SUNWzoneu/save/122663-06
./patch/122663-06

After I located the patch directories, I looked for the file named undo.Z. The undo.Z file contains a backup of each file the patch overwrites, and is used by the patchrm utility to restore a server to it’s previosu state. To find the right undo.Z file, I ran the find command in the pkg/SUNWzoneu directory, and then used the ls “-i” (print inode) and “-l” (long output) options to print the inode number and size of each undo.Z file I found:

$ ls -li ./pkg/SUNWzoneu/save/pspool/SUNWzoneu/save/122663-06/.Z ./pkg/SUNWzoneu/save/122663-06/.Z

102041 -rw-r--r-- 1 root root 158534 Oct 29 08:56 ./pkg/SUNWzoneu/save/122663-06/undo.Z
101589 -rw-r--r-- 1 root root 158534 Oct 29 08:56 ./pkg/SUNWzoneu/save/pspool/SUNWzoneu/save/122663-06/undo.Z

Since the size and timestamps on the files I located were identical (as a side note – I am curious why Sun keeps two copies of the undo.Z file. If anyone knows, please add your thoughts to the comment section. ), I copied one of the files to /tmp, and used the uncompress and pkgadd utilities to extract the file to /tmp/u:

$ cp undo.Z /tmp

$ cd /tmp

$ uncompress undo.Z

$ pkgadd -s /tmp/u -d undo

The following packages are available:
1 SUNWzoneu Solaris Zones (Usr)
(i386) 11.10.0,REV=2005.01.21.16.34

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]:
Transferring package instance

Once the packages were extracted to /tmp/u, I started to poke around to see which files were included in the package:

$ cd /tmp/u/

$ find . ..

.
./pkginfo
./pkgmap
./install
./install/checkinstall
./install/postinstall
./reloc
./reloc/usr
./reloc/usr/lib
./reloc/usr/lib/amd64
./reloc/usr/lib/amd64/libzonecfg.so.1
./reloc/usr/lib/libzonecfg.so.1
./reloc/usr/share
./reloc/usr/share/lib
./reloc/usr/share/lib/xml
./reloc/usr/share/lib/xml/dtd
./reloc/usr/share/lib/xml/dtd/zonecfg.dtd.1

Since the new version of libzonecfg.so was most likely the cause of my problems, I backed up the shared libraries the patch had installed in /usr/lib and /usr/lib/amd, and then replaced these with the versions I had extracted to /tmp/u:

$ cd /usr/lib /usr/lib32

$ pwd
/usr/lib

$ cp libzonecfg.so.1 libzonecfg.so.1.orig

$ cp /tmp/u/SUNWzoneu/reloc/usr/lib/libzonecfg.so.1 . ..

$ cd /usr/lib/amd

$ pwd
/usr/lib/amd

$ cp libzonecfg.so.1 libzonecfg.so.1.orig

$ cp /tmp/u/SUNWzoneu/reloc/usr/lib/amd/libzonecfg.so.1 . ..

Once the old version of libzonecfg was in place, I was able to boot my zones without issue:

$ zoneadm -z dns boot

This experience once again leads me to wonder if Sun actually tests patches prior to sending them out to the public (this is my second bad experience in as many months). Now to schedule another downtime to properly back out the patch. :(

This article was posted by Matty on 2006-10-30 19:46:00 -0400 -0400