Archive
Posts in Solaris
Getting notified when hardware breaks
With the introduction of Solaris 10, the Solaris kernel was modified and userland tools were added to detect and report on hardware faults. The fault analysis is handled by the Solaris fault manager, which currently detects and responds (the kernel can retire memory pages, CPUs, etc. when it detects faulty hardware) to failures in AMD and SPARC CPU modules, PCI and PCIe buses, memory modules, disk drives and eventually Intel Xeon processors and system sensors (e.g., fan speed, thermal sensors, etc.). To see if the fault manager has diagnosed a faulty component on a Solaris 10 or Nevada host, the fmadm utility can be run with the "faulty" option: The fmadm output includes the suspect component, the state of the component and a unique identifer to identify the fault…
$ read more →Getting orca working on Solaris hosts
I installed the SE Toolkit on several Solaris 10 hosts this week, and noticed that the se process was SEGFAULT'ing during startup: Aug 10 23:09:27 nbm01 genunix: [ID 603404 kern.notice] NOTICE: core_log: se.sparcv9[17571] core dumped: /var/core/core.se.sparcv9.17571 After fileing a bug report on the SE Toolkit website, it dawned on me that the issue wasn't with the se program, but with the orcallator.se script that accompanied the SE Toolkit. Based on a hunch, I installed the latest version of orcallator.se from the Orca website, and that fixed my issue (and provided a number of additional useful performance graphs!). Hopefully this will help others who bump into this issue.
$ read more →Deploying highly available Oracle databases with Sun Cluster 3.2
While preparing for my Sun cluster 3.2 exam, I got a chance to play with a number of the Sun Cluster 3.2 data services. One of my favorite data services was the Oracle HA data service, which allows Sun cluster to monitor and failover databases in response to system and application failures. Configuring the Oracle HA data service is amazingly easy, and it took me all of about 5 minutes (plus two hours reading through the Oracle data service documentation, installing Oracle and creating a database). Here are the steps I used to configure Sun Cluster 3.2 to failover an Oracle 10G database between two nodes: Step 1…
$ read more →All Solaris core files are not created equally!
While playing around with the Solaris process.max-core-size resource control last week, I noticed some interesting behavior. If you set the system wide maximum size of a core with projmod: and then attempt to generate a core file: The core file that is produced will match the value of the resource control: Now the interesting part. If you have global core file support enabled, the size of the core file written to the global core file directory is not limited in size: total 6692 drwxr-xr-x 2 root root 512 May 22 14:29 . drwxr-xr-x 44 root sys 1024 May 22 14:28 …
$ read more →Solaris SMART support is finally becoming a reality!!
A while back I wrote a blog entry about the lack of SMART support in Solaris. Just recently, Eric Schrock added a FMA disk-transport diagnosis engine, which provides generic SMART monitoring as part of the base operating system. The disk-transport diagnosis engine currently only supports SATA disk drives, but SCSI support is right around the corner! This is exciting news, and I am stoked that SMART support is finally becoming a reality!!!!
$ read more →