A real world approach to learning new Operating Systems

New Operating Systems don’t pop into existence every day, but there are a slew of them out there. This includes various versions of Windows, BSD OSs, a number of Linux distributions, Solaris, AIX, Plan 9 as well as several others. As a technology geek I’m always looking to learning something new, and I recently got the opportunity to expand my Operating System knowledge. I’m now spending a good bit of my time learning everything there is to know about IBM’s AIX.

When I’ve had to learn a new OS in the past I’ve typically started off by finding a decent book (see my previous post for tips on finding cheap used books) on the OS and a machine that is capable of running the Operating System. The book allows me to pick up the basics in a short period of time, and the server allows me to run through various scenarios to see how you would install, configure, secure, troubleshoot and fix issues with the Operating System.

In addition to reading and “tinkering” around, I also like to make a list of things I need to get hands on experience with. This typically breaks down into something like this:

– How do you correctly install the operating system?

– How do I add new storage or expand existing storage?

– How do I apply and remove patches to the system?

– What steps do I need to go through to secure the system?

– What tools are available to monitor performance?

– How does the OS provide highly available services?

– What tools are available to view hardware and software problems?

– How does the logical volume manager work?

– How do I recover from a disk failure?

– How do I recover from a corrupted root file system?

– How do I recover the system from single user mode?

– How do I repair a broken package?

– How do I backup and restore a system (I focus on bare metal restores)?

– What bonding modes are supported and how do I configure them?

– Which virtualization technologies are available?

– How do I keep up to date with security and reliability updates?

And the list goes on, and on … When my list is relatively complete I always find a way to simulate each scenario with my test machine. Breaking machines, fixing them and then documenting what you did is one of the best ways to nail down the basics. Real world experience is obviously better, but I like to have a firm grasp of the basics before I start making changes to systems that could have potentially negative effects (broken patches that hose systems, updates that cause unintended issues, etc.).

In addition to getting some hands on skills the documents I produce while I’m learning are quite handy to have on standby in case you need to perform these tasks down the road. I have learned first hand the importance of familiarizing oneself with the basics of recovering a system from various disaster scenarios, because at 2am when your companies site is down you don’t have time to read through manuals or deal with 8 lines of support engineers. You need to get things back up, and if you learn how to deal with disaster situations ahead of time you will be calm, cool and collected at 2am (this assumes the disaster is something you are able to recover from though).

Once I have the basics mastered the next thing I focus on is getting certified. While I don’t place a ton of credibility in IT certifications, I definitely feel they are a great way to expand your knowledge and learn things you might not have otherwise known. If I’m fortunate enough to have the luxury of vendor OS support, I love to open a few support cases to learn how the support organization for the OS works. I’ve yet to find two companies who operate the same way, and it’s nice to learn the system before you truly need it.

I just started reading my AIX book this past weekend, and plan to start playing around with a couple of IBM p550s I have access to. I’m also going to take my AIX certification test in a couple of weeks, so I’ll definitely be crazy busy for the next month learning as much as I can. Luckily for me I love learning new things and experimenting with technology. If you’ve had to learn a new OS in the past few years feel free to chime in. I would love to get others thoughts / feedback on how they learn new stuff!!

2 thoughts on “A real world approach to learning new Operating Systems”

  1. One of the things I find that can be useful is gaining a better understanding of the disk I/O subsystem and how/when its data cache works. Personally I find a lot of self-proclaimed experts don’t always get it right either and one of the best services you can do for yourself is to conduct a number of experiments, as you’d mentioned, that focus on I/O rates and memory usage.

    For example, LOTS of people (believe it or not) think their linux system is running out of memory as the ‘free’ sizes approaches 0 and don’t even realize it’s because their data cache is growing and that this is simply how linux works, aging out old data as memory pressure increases.

    It can also be very useful to see why when writing a small file and not flushing the case to disk when you’re done, you can achieve I/O rates that defy the laws of physics and thing you write great code when in fact all you’ve done is fill the cache and close your program.

    When I first wrote collectl, I would naturally want to verify it was reporting correct stats and this forced me to do a lot of experiments and as a result did gain a much better understanding. In fact I found a bug in the way disk i/o rates were recorded and after contacting one of the developers got it fixed in the 2.6 kernel.

    My point is, if you understand what the data means, you can infer how the kernel is doing things.

    -mark

Leave a Reply

Your email address will not be published. Required fields are marked *