Prefetch Technologies // Keeping your cache lines cozy

Archive

Posts in Solaris

Figuring out what a hung Solaris process is doing inside the kernel

solarisMay 15, 2009 1 min read

I had a a process hang last week on one of my Solaris hosts, and was curious what each thread was doing. The mdb utility is perfect for locating this information, since you an combine pid2proc with the walk and findstack dcmds to get the call stack of each thread in a process (in the example below, I am examining process id 48): It turns out the issue I encountered was due to a bug, which will hopefully be fixed in the near future.

$ read more →

Using ipmitool to manage the reboot process on Solaris hosts

solarisMay 14, 2009 1 min

I've talked about ipmitool a couple of times in the past, and have grown to love this super useful tool. My good friend and fellow blogging partner Mike Svoboda mentioned a few weeks back that ipmitool had a bootdev option, which can be used to tell the machine what to do the next time the machine is power cycled. This is useful for booting a pmachine via PXE, telling a machine to go into the bios, booting a machine in safe mode, or booting from an attached cdrom or disk drive. The full list of boot options can be viewed by running ipmitool with the bootdev help option: bootdev [clear-cmos=yes|no] none : Do not change boot device order pxe : Force PXE boot disk : Force boot from default Hard-drive safe : Force boot from default Hard-drive, request Safe Mode diag : Force boot from Diagnostic Partition cdrom : Force boot from CD/DVD bios : Force boot into BIOS Setup If you need to boot a machine into the bios, you can specify the bios target: To boot a machine via PXE, you can use the pxe option: The pxe option is incredibly powerful, since it provides some nice glue to make lights out automated build systems…

$ read more →

Deploying highly available zones with Solaris Cluster 3.2

solarisApr 10, 2009 7 min

I discussed my first impressions of Solaris Cluster 3.2 a while back, and have been using it in various capacities ever since. One thing that I really like about Solaris Cluster is its ability to manage resources running in zones, and fail these resources over to other zones running on the same host, or a zone running on a secondary host. Additionally, Solaris Cluster allows you to migrate zones between nodes, which can be quite handy when resources are tied to a zone and can't be managed as a scalable services. Configuring zone failover is a piece of cake, and I will describe how to do it in this blog post…

$ read more →

Debugging a Solaris fault manager fault

solarisApr 4, 2009 1 min

I recently debugged an issue where a host panicked with the following message: These errors are typically generated due to CPU or memory faults, but on this specific machine nothing was being displayed when I checked the fault and errors logs. Upon closer inspection, it looked like the fault manager wasn't running and had transitioned into the maintenance state: After poking around Sunsolve, I noticed that there were a number of issues that can cause fmd to enter the maintenance state. In this specific case, the daemon was core dumping at startup so I had a core file readily available to help debug the source of the problem. Upon inspecting the panic string, it turned out that we were bumping into bug 6672662…

$ read more →

Fixing Solaris Cluster device ID (DID) mismatches

solarisMar 23, 2009 1 min

I had to replace a disk in one of my cluster nodes, and was greeted with the following message once the disk was swapped and I checked the devices for consistency: cldevice: (C894318) Device ID "snode2:/dev/rdsk/c1t0d0" does not match physical device ID for "d5". Warning: Device "snode2:/dev/rdsk/c1t0d0" might have been replaced. To fix this issue, I used the cldevice utilities repair option: Once the repair operation updated the devids, cldevice ran cleanly: Niiiiiiiiice!

$ read more →