Blog O' Matty


Generating random numbers with bash

This article was posted by Matty on 2006-02-11 13:17:00 -0400 -0400

While perusing through some documentation last week, I came across the bash RANDOM variable. This nifty little built-in can be used to generate a random signed integer value:

$ echo $RANDOM
17142

$ echo $RANDOM
1713

This could definitely be valuable for solving some bash-specific problems, but nothing can replace /dev/random and a hardware random number generator.

When direct I/O attacks!

This article was posted by Matty on 2006-02-08 23:29:00 -0400 -0400

I was recently asked by one of my colleagues to investigate a slow disk subsystem on one of our servers. The administrator was getting horrible I/O throughput when unzipping files, and the Oracle installer was erroring out when attempting to perform a database upgrade. To get a better idea of just how bad the situation was, I decided to run iostat while he unzipped the file:

$ iostat -zxn 5

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 43.8 0.0 419.0 0.0 0.3 0.0 6.5 0 28 c4t18d0
0.0 26.4 0.0 393.9 0.0 0.2 0.0 6.0 0 16 c4t17d0
0.8 36.0 6.4 460.4 0.0 0.3 0.0 7.2 0 25 c4t19d0
21.2 43.8 169.6 419.0 0.0 0.4 0.0 5.4 0 35 c5t2d0
20.4 36.0 163.2 460.4 0.0 0.3 0.0 5.5 0 30 c5t3d0
20.8 26.4 166.4 393.9 0.0 0.2 0.0 4.1 0 19 c5t1d0

The iostat data shows that we are reading and writing at most 620k of data, and the average I/O size is around 12k. For some reason I found this amusing, so I put my current task on hold and started to debug the problem (I stayed late to finish that task though). To start debugging the issue, I decided to time the unzip operation on the problematic RAID 1+0 volume (which was using VxFS):

$ date ; unzip -q fj_9_Recommended.zip ; date
Fri Jan 20 15:04:22 EST 2006 Fri Jan 20 15:16:34 EST 2006

Ouch – it took 12-minutes to unzip a 200MB file, which is DEFINITELY not normal. To see how “off” we were, I unzipped the same file on a RAID 1 volume with two mirrored disks (also using VxFS):

$ date ; unzip -q fj_9_Recommended.zip ; date
Fri Jan 20 16:43:22 EST 2006 Fri Jan 20 16:44:54 EST 2006

This operation completed in just over a minute, and with far fewer disks. I attributed this speed increase to the Solaris page cache, and decided to run vxtrace to get a better idea of which I/O operations were generated as part of the unzip operation:

$ vxtrace -g oradg -l -o disk | grep START

309986 394877809 START write disk c5t1d0s2 op 0 block 20517834 len 2
309987 394877809 START write disk c4t17d0s2 op 0 block 20517834 len 2
309988 394877809 START read disk c5t1d0s2 op 0 block 3487120 len 16
309989 394877810 START write disk c5t1d0s2 op 0 block 2567 len 5
309990 394877810 START write disk c4t17d0s2 op 0 block 2567 len 5
309991 394877811 START read disk c5t1d0s2 op 0 block 3487136 len 16
309992 394877812 START write disk c5t1d0s2 op 0 block 20517836 len 12
309993 394877812 START write disk c4t17d0s2 op 0 block 20517836 len 12
309994 394877813 START read disk c5t1d0s2 op 0 block 3487152 len 16
309995 394877814 START write disk c5t1d0s2 op 0 block 2572 len 5
309996 394877814 START write disk c4t17d0s2 op 0 block 2572 len 5
309997 394877815 START read disk c5t1d0s2 op 0 block 3487168 len 16
309998 394877816 START write disk c5t1d0s2 op 0 block 20517848 len 4
309999 394877816 START write disk c4t17d0s2 op 0 block 20517848 len 4
310000 394877816 START read disk c5t1d0s2 op 0 block 3487184 len 16

Well that’s odd, it looks like there are a bunch of small writes and numerous 16k reads. Since the box had plenty of free memory, and the previosu unzip operation had completed successfully, I started to wonder why any reads were being issued (they should be in the Solaris page cache). To get another frame of reference for comparison, I decided to run vxtrace while I unzipped the file on the RAID 1 volume:

$ vxtrace -g rootdg -l -o disk | grep START |more

385161 395719986 START write disk c5t0d0s2 op 0 block 19756 len 128
385162 395719986 START write disk c4t16d0s2 op 0 block 19756 len 128
385163 395719987 START write disk c5t0d0s2 op 0 block 25868032 len 64
385164 395719987 START write disk c4t16d0s2 op 0 block 25868032 len 64
385165 395719987 START write disk c5t0d0s2 op 0 block 25832064 len 64
385166 395719987 START write disk c4t16d0s2 op 0 block 25832064 len 64
385167 395719987 START write disk c5t0d0s2 op 0 block 25832128 len 128
385168 395719987 START write disk c4t16d0s2 op 0 block 25832128 len 128
385169 395719987 START write disk c5t0d0s2 op 0 block 25832256 len 128
385170 395719987 START write disk c4t16d0s2 op 0 block 25832256 len 128
385171 395719988 START write disk c5t0d0s2 op 0 block 25832384 len 128
385172 395719988 START write disk c4t16d0s2 op 0 block 25832384 len 128
385173 395719988 START write disk c5t0d0s2 op 0 block 25832512 len 128
385174 395719988 START write disk c4t16d0s2 op 0 block 25832512 len 128
385175 395719988 START write disk c5t0d0s2 op 0 block 25832640 len 128
385176 395719988 START write disk c4t16d0s2 op 0 block 25832640 len 128

As you can see from the vxtrace output, the unzip operation caused zero reads and numerous 128k writes. Once I reviewed the data, I began to speculate that the file system was at fault, and decided to run fstyp and mount to view the mount options and file system layout:

$ mount | grep u02

/u02 on /dev/vx/dsk/oradg/oradgvol01 read/write/setuid/mincache=direct/convosync=direct/delaylog/largefiles/ioerror=mwdisable/dev=3bd4ff0 on Mon Dec 5 22:21:31 2005

$ fstyp -v /dev/vx/dsk/oradg/oradgvol01 |more

vxfs
magic a501fcf5 version 5 ctime Mon Apr 25 16:46:13 2005
logstart 0 logend 0
bsize 1024 size 320046080 dsize 320046080 ninode 0 nau 0
defiextsize 0 ilbsize 0 immedlen 96 ndaddr 10
aufirst 0 emap 0 imap 0 iextop 0 istart 0
bstart 0 femap 0 fimap 0 fiextop 0 fistart 0 fbstart 0
nindir 2048 aulen 32768 auimlen 0 auemlen 8
auilen 0 aupad 0 aublocks 32768 maxtier 15
inopb 4 inopau 0 ndiripau 0 iaddrlen 8 bshift 10
inoshift 2 bmask fffffc00 boffmask 3ff checksum da5e4db
oltext1 35 oltext2 16642 oltsize 1 checksum2 422d
free 286678607 ifree 0
efree 5 3 11 9 3 1 0 1 1 0 1 1 1 0 1 26 17 12 8 4 4 1 1 0 2 1 1 1 0 0 0 0

Well there we go. It looks like the volume was mounted with the VxFS “mincache=direct” and “convosync=direct” options, which are used to enable direct I/O with VxFS file systems. Since this volume was primarily used to process files, direct I/O was causing a physical I/O operation to be issued for each read() and write() . Since direct I/O has the side effect of disabling write clustering and file system read-ahead, I knew immediately that this was the problem. I ended up fixing the issue by remounting the file system without the “mincache” and “convosync” options. After the remount completed, I once again unzipped the file on the RAID 1+0 volume, and the operation took all of 30-seconds (thank you read-ahead and write clustering) to complete. I love debugging!

Using DTrace to measure Apache request processing time

This article was posted by Matty on 2006-02-08 23:19:00 -0400 -0400

While fiddling with Apache and DTrace this week, I wrote a script (requestime.d) to measure the time it takes to read and process each request sent to an httpd process:

$ dtrace -q -32 -s requestime.d -ppgrep httpd``

Processed / in 3023 microseconds
Processed /apache_pb.gif in 752 microseconds

$ cat requestime.d

pid$target::ap_read_request:entry
{
self->apread = 1;
}

pid$target::apr_time_now:entry
/ self->apread /
{
self->ts = timestamp;
self->apread = 0;
}

pid$target::ap_read_request:return
{
self->method = arg1 == 0 ? "Unknown" :
copyinstr(*(uintptr_t *)copyin(arg1 + 72, sizeof(uintptr_t)));

self->uuri = arg1 == 0 ? "Unknown" :
copyinstr(*(uintptr_t *)copyin(arg1 + 200, sizeof(uintptr_t)));
}

pid$target::ap_process_request:return
{
printf("Processed %s in %d microsecondsn", self->uuri, (timestamp - self->ts) / 1000);

self->uuri = 0;
self->ts = 0;
}

Writing DTrace scripts is fun!

Super useful SysAdmin site

This article was posted by Matty on 2006-02-08 18:53:00 -0400 -0400

I don’t typically BLOG websites I find, but the sysunconfig website deserves mentioning. There are numerous useful pieces of documentation on the site, and it looks like the the webmaster likes to get his hands into every facet of technology! Awesome!

Viewing name service cache statistics

This article was posted by Matty on 2006-02-08 18:42:00 -0400 -0400

On the Solaris and Linux systems I support, I like to run the nscd (Name Service Cahing Daemon) daemon to cache password, group and host file data. Caching can be extremely valuable on servers that handle lots of logins, and is especially beneficial if you are authenticating your users against a directory server (it decreases the amount of traffic sent between hosts). Since nscd is unable to fill each and every request from cache, it has to periodically incur a cache miss. To see how often this occurs on a Solaris server, nscd can be run with the “-g” (print statistics) option:

$ nscd -g |more

nscd configuration:

0 server debug level
"/dev/null" is server log file

passwd cache:

Yes cache is enabled
1585148 cache hits on positive entries
0 cache hits on negative entries
1228 cache misses on positive entries
270 cache misses on negative entries
99.9% cache hit rate
0 queries deferred
13 total entries
0 complete cache invalidations
211 suggested size
600 seconds time to live for positive entries
5 seconds time to live for negative entries
20 most active entries to be kept valid

Yes check /etc/{passwd, group, hosts, inet/ipnodes} file for changes
No use possibly stale data rather than waiting for refresh

This is nifty!