A completely (local) diskless datacenter with iSCSI

Being able to boot a machine from SAN isn’t exactly a new concept.  Instead of having local hard drives in thousands of machines, each machine logs into the fabric and boots the O/S from a LUN exported via fiber on the SAN.  This requires a little bit of configuration on the Fiber HBA, but it has the advantage of no longer dealing with local disk failure.

In OpenSolaris Navada build 104 on x86 platforms, iSCSI boot was incorporated.

If you have a capable NIC, you can achieve the same results of “boot from SAN” as fiber, but without the additional costs of an expensive fiber SAN network.  Think of the possibilities here —

Implement a new AmberRoad Sun Storage 7000 series NAS device like the 7410 exporting hundreds iSCSI targets for each of your machines, implement ZFS Volumes on the backend, and leverage the capability of ZFS snapshots, clones, etc with your iSCSI root file system volumes for your machines.  Even if your “client” machine mounts a UFS root filesystem over iSCSI, the backend would be a ZFS volume.
Want to provision 1000 machines in a day?  Build one box, ZFS snapshot/clone the volume, and create 1000 iSCSI targets.  Now the only work comes in configuring the OpenSolaris iSNS server with initiator/target parings.   Instant O/S provisioning from a centrally managed location.

Implement two Sun Storage 7410 with clustering, and now you have a HA solution to all O/Ses running in your datacenter.

This is some pretty cool technology.  Now, you have only one machine to replace disk failures at, instead of thousands, at a fraction of the cost it would take to implement this on Fabric!  Once this technology works out the kinks and becomes stable, this could be the future of server provisioning and management.

Resource controls against fork bombs executed inside Solaris Zones

I came across this neat little tidbit on page 27 while reading through the pdf article UNDERSTANDING THE SECURITY CAPABILITIES OF SOLARIS™ ZONES SOFTWARE

As a test, I’m going to set this resource control on a zone and execute a fork bomb to see what appears in system logs.  This is pretty cool stuff! 

Miscellaneous Controls
One well-known method to over-consume system resources is a fork-bomb. This method does not necessarily consume a great deal of memory or CPU resources, but rather seeks to use up all of the process slots in the kernel’s process table. In the Solaris OS, a running process starts with just one thread of execution, also called a Light Weight Process (LWP). Many programs generate new threads, becoming multithreaded processes. By default, Solaris systems with a 64-bit kernel can run over 85,000 LWPs simultaneously. A booted zone that is not yet running any applications has approximately 100 to 150 LWPs. To prevent a zone from using too many LWPs, a limit can be set on their use. The following command sets a limit of 300 LWPs for a zone.
global# zonecfg -z web
zonecfg:web> set max-lwps=300
zonecfg:web> exit
global# zoneadm -z web reboot
 
 
 This parameter can be used, but should not be set so low that it impacts normal application operation. An accurate baseline for the number of LWPs for a given zone should be determined in order to set this valuable at an appropriate level. The number of LWPs used by a zone can be monitored using the following prstat command.
In this example, the web zone currently has 108 LWPs. This value changes as processes are created or exit. It should be inspected over a period of time in order to establish a more reliable baseline, and updated when the software, requirements, or workload change.
Using the max-lwps resource control successfully usually requires the use of a CPU control, such as the FSS or pools to ensure that there is enough CPU power in the global zone for the platform administrator to fix any problems that might arise.
 
global# prstat -LZ
[…]
ZONEID NLWP SWAP RSS MEMORY TIME CPU ZONE
0 248 468M 521M 8.6% 0:14:18 0.0% global
37 108 76M 61M 1.0% 0:00:00 0.0% web
Total: 122 processes, 356 lwps, load averages: 0.00, 0.00, 0.01

zpool shrink / evict is almost here

The inability to remove devices from ZFS Zpools has been one of the most annoying / inflexiable things about ZFS.  I once read a blogpost about somenone who added a USB flash stick into the root ZFS pool, and now the USB stick became a perminent fixture of the machine!  There was no simple way to fix this issue other than backup / network dump / rebuild machine / network restore.

Matthew Ahrens worked on changing the ZFS scrub code in SNV 94 (which made its way into the Fishwork’s Amber Road NAS Appliance), but this work will also lay the groundwork for “zpool evict” or the such.  This is a great read on the internal working of how blockpointers are used within ZFS.

Helpful shell shortcuts

So this may be a little basic, but I find myself using these two shortcuts quite a bit while at the shell.

If you ever find yourself wanting to “reuse” the last argument in a command — for example, here I move a file from one location into /var/tmp and I want to “cd” into /var/tmp without having to type it, use the shell variable !$…

locutus:~
(svoboda)> dd if=/dev/zero of=/tmp/blah bs=1024000 count=1
1+0 records in
1+0 records out
1024000 bytes (1.0 MB) copied, 0.0109023 s, 93.9 MB/s

locutus:~
(svoboda)> mv /tmp/blah /var/tmp

locutus:~
(svoboda)> cd !$
cd /var/tmp

locutus:/var/tmp
(svoboda)> pwd
/var/tmp

If you wanted to “preface” your last command, you can throw anything you want into the shell followed by the !! shell shortcut.
locutus:/var/tmp
(svoboda)> “Armin van Buuren’s a State of Trance”
-bash: Armin van Buuren’s a State of Trance: command not found

locutus:/var/tmp
(svoboda)> echo !!
echo “Armin van Buuren’s a State of Trance”
Armin van Buuren’s a State of Trance

The first line mearly “shows” what is being executed, with the second line executing the actual command.  Not rocket science, but whatever helps on saving keystrokes!

Crossbow (network virtualization) is now part of the opensolaris code base!

It has been a long time coming, but it looks like the network team finally integrated project crossbow (network virtualization) into Nevada build 105:

Comments:
PSARC/2006/357 Crossbow – Network Virtualization and Resource Management
6498311 Crossbow – Network Virtualization and Resource Management
6402493 DLPI provider loopback behavior should be improved
6453165 move mac capabs definitions outside mac.h
6338667 Need ability to use NAT for non-global zones
6692884 several threads hung due to deadlock scenario between aggr and mac
6768302 dls: soft_ring_bind/unbind race can panic in thread_affinity_set with cpu_id == -1
6635849 race between lacp_xmit_sm() and aggr_m_stop() ends in panic
6742712 potential message double free in the aggr driver
6754299 a potential race between aggr_m_tx() and aggr_port_delete()
6485324 mi_data_lock recursively held when enabling promiscuous mode on an aggregation
6442559 Forwarding perf bottleneck due to mac_rx() calls
6505462 assertion failure after removing a port from a snooped aggregation
6716664 need to add src/dst IP address to soft ring fanout

This is exciting news, and I can’t wait to play with all the cool new stuff crossbow offers!

OpenSolaris IPS repository offerings growing

I’m really glad to see the OpenSolaris IPS repositories growing with the amount of available packages.  Large network repositories of thousands of software packages make Fedora and Ubuntu the great, easy to use Linux distributions that they are.  Extending the amount of packages available to OpenSolaris just builds upon this usability!

A graph explaining the IPS repository structure, the forum post showing how to enable the pending repository, and a complete list of the 1708 pending IPS repository packages can be found here.

The OpenSolaris community really needs people to assist in finding / submitting any found bugs within these packages.  If you were looking for a way to assist the OpenSolaris community, here’s your chance!