I met up this week with one of my friends that I haven’t seen in a while. We chatted about life, work and eventually started chatting about Linux and Solaris (we are both SysAdmins). My friend mentioned that his company had decided to quit buying Sun hardware in favor of Dell servers running Redhat Linux Advanced Server. I was shocked to hear this since my friend had actively pushed Solaris in the past, and was one of the folks I regularly got together with to discuss new technologies merged into Nevada. His company has numerous concerns surrounding Solaris 10 manageability and Sun’s lack of ACTIVE support for commonly used opensource packages. We chatted about this for hours over cocktails, and both came to the conclusion that Sun needs to do something to address the following problems with Solaris:
1. Solaris doesn’t ship with a working and supported LAMP stack (I should probably say SAMP stack). My friend’s company is frustrated with having to manually download and build Apache, MySQL, and PHP on their Solaris boxes, and chose to move to Redhat Advanced server to get a working and SUPPORTED LAMP solution out of the box. I am not sure why Sun can’t ship a working and supported SAMP stack with Solaris. This seems like a no brainer to me.
2. Several of the developers at my friends company have transitioned to Fedora Core on their desktops, since the desktop looks pleasant, wireless works out of the box for most chipsets, eclipse is an installation option, and there is a full suite of applications available after the installation. The Fedora Core desktop is quite a bit more usable that JDS (if you don’t believe me, install Fedora Core 5 side-by-side with JDS), so developers have jumped all over it (at least those that don’t use Windows). Sun really needs to do something to improve desktop usability, and they should use the GNOME release from gnome.org versus their own variant. They also need to do something to address package management, either by adopting blastwave or developing a decent remotely-accessible package repository.
3. Redhat Linux ships and provides regular updates for numerous opensource software (e.g., postgres, MySQL, Apache, Samba, Bind, Sendmail, openssh, openssl, etc), where Sun keeps trying to sell customers the Sun Java One stack, “modifies” an opensource package and diverges the product from what is available everywhere else, and fails to provide timely bug fixes and security patches for the opensource packages that are shipped (Apache, MySQL and Samba are perfect examples) with Solaris. Sun really needs to get some folks focused on supporting the opensource solutions people use, versus shipping opensource software and letting the bits rot.
4. Several key ISVs are pushing Linux and Windows over Solaris, and have switched from Solaris to Linux as their tier I development platform. This typically means that developers will squash more platform-specific bugs in their product prior to shipping it, since they are using that platform daily. Sun needs to do more to get developers writing code on Solaris, since this helps Sun customers in the end.
5. Managing applications and patches on Solaris systems is a disaster, and redhat’s up2date utility is not only efficient, but has numerous options to control the patch notification and update process. This can also be used along with Redhat’s satellite server to provide Enterprise wide patch and application management. While Sun kicked off an effort to address the patch and installation process, I wonder if it will be too little too late.
6. Staying on the cutting edge with Nevada is difficult, since there is currently no way to easily and automatically upgrade from one release of Nevada to another. On Fedora Core servers, you can run ‘yum upgrade’ to get the latest bits. Having to download archives and BFU is tedious, and most admins don’t want to spend their few spare cycles BFU’ing to new releases.
7. Zones are unusable at my friend’s site, since there is currently no way to filter traffic between zones, apply QOS measures to memory, I/O and network resources, and patching a box with zones can take days in some cases (I have experienced this first hand. If you want to see, install Solaris 10, create 25 non-sparse zones, and run smpatch update). Addressing these items would allow SysAdmins to actually patch their systems, and would allow folks to sleep at night knowing that the QOS measures will protect rogue applications from taking down their servers.
8. The Solaris opensource movement was great, but in our opinions it is very much closed to the outside world. How many people outside of Sun have actually done ARC reviews, code reviews, or applied a putback to the kernel source tree (there may be cases, but I can’t find them on opensolaris.org)? This is definitely not something that can happen overnight, but people who have to wait in a queue for a sponsor, or worse yet are ignored (I filed a bug 3-months ago and asked to work on it, and have yet to hear back from Sun) when they try to fix something, will cause people to join communities where their voice actually matters.
That said, Solaris 10 is an awesome Operating System, and comes with some incredible technologies (e.g., ZFS, DTrace, FMA, etc). I truly do hope that Sun takes some steps to address these issues, since it will hopefully lead to further adoption of Solaris.
I came across OOPS! An Introduction to Linux Kernel Debugging while surfing the web, and found the presentation interesting. The information on sysinfo and sysrq was especially interesting, since these modules can be valuable tools for determing why a specific version of the Linux kernel decided to bite the dust!
While reading through RFC 1813 (NFSv3 RFC), I came across the following interesting NFS error:
The server initiated the request, but was not able to complete it in a timely fashion. The client should wait and then try the request with a new RPC transaction ID. For example, this error should be returned from a server that supports hierarchical storage and receives a request to process a file that has been migrated. In this case, the server should start the immigration process and respond to client with this error.
This is nifty, and leads me to wonder if any NFS-based HSM solutions are utilizing this.
While rereading several sections in the Solaris DTrace user guide, I came across the following descriptions for the timestamp and vtimestamp variables:
uint64_t timestamp: The current value of a nanosecond timestamp counter. This counter increments from an arbitrary point in the past and should only be used for relative computations.
uint64_t vtimestamp: The current value of a nanosecond timestamp counter that is virtualized to the amount of time that the current thread has been running on a CPU, minus the time spent in DTrace predicates and actions. This counter increments from an arbitrary point in the past and should only be used for relative time computations.
After reading this, it dawned on me that some of the scripts I wrote should have used vtimestamp instead of timestamp (blocking operations can really skew the results). Luckily I foudn this now, so I can take advatnage of it whiel debugging problems in the future.
If you have ever had to deal with a sick Redhat server, you may be familiar with the rescue, emergency and singler-user modes of operation. I have heard people refer to rescue modes incorrectly, which can sometimes lead to some interesting stories (there are several slight subtleties between them). To clear up any confusion surrounding these terms, here are the official descriptions from the Redhat administration guide:
Rescue mode provides the ability to boot a small Red Hat Enterprise Linux environment entirely from CD-ROM, or some other boot method, instead of the system’s hard drive. As the name implies, rescue mode is provided to rescue you from something. During normal operation, your Red Hat Enterprise Linux system uses files located on your system’s hard drive to do everything — run programs, store your files, and more.
In emergency mode, you are booted into the most minimal environment possible. The root file system is mounted read-only and almost nothing is set up. The main advantage of emergency mode over single-user mode is that the init files are not loaded. If init is corrupted or not working, you can still mount file systems to recover data that could be lost during a re-installation.
In single-user mode, your computer boots to runlevel 1. Your local file systems are mounted, but your network is not activated. You have a usable system maintenance shell. Unlike rescue mode, single-user mode automatically tries to mount your file system. Do not use single-user mode if your file system cannot be mounted successfully. You cannot use single-user mode if the runlevel 1 configuration on your system is corrupted.