Blog O' Matty


Respect my ~/.Xauthority !#@$!

This article was posted by Matty on 2008-04-05 01:09:00 -0400 -0400

South Park is a hilarious show, and I think that Cartman is the best character.  One of Cartman’s classic lines is “YOU WILL RESPECT MY AUTHORITAH!#!"

So Cartman wasn’t a unix geek and wasn’t talking about X11 Forwarding / SSH, but maybe there is a moral to the story.

You have to execute some sort of GUI program on a remote host and it requires root access in order to execute (or you have to change to a different user to execute the GUI with correct permissions)…

At first, when you logged into the machine for the first time without X11 forwarding enabled, your ~/.Xauthority file doesn’t exist…

cartman@locutus:~$ls -l ~/.Xauthority ls: /home/cartman/.Xauthority: No such file or directory

So you log back out, and when you ssh back into the remote machine, you remember to forward X11 by issuing..

$ ssh -X <user>@<remote box>

i.e.

$ ssh -X cartman@locutus
Linux locutus 2.6.22-14-generic #1 SMP Sun Oct 14 23:05:12 GMT 2007 i686 cartman@locutus:~$ xclock

Sure enough, you fire up /usr/bin/xclock (or /usr/openwin/bin/xclock) and verify that the GUI program displays back on your local desktop.

cartman@locutus:~$ echo $DISPLAY localhost:10.0

Sweet.  Next, when you change users..

cartman@locutus:~$su - Password

: root@locutus:~# id uid=0(root) gid=0(root) groups=0(root)

root@locutus:~# xclock Error: Can’t open display:*

you loose your X11 forwarding.  DOH!

So whats the solution here?  You can’t log in directly to the box as the root user (this should always be disabled.  Its really bad practice if it isn’t) – and you don’t really want to throw a SSH key into /root/.ssh/authorized_keys for obvious reasons – so what’s there to do?

When you SSH into a machine with X11 forwarding, it opens a TCP port, tunnels it through SSHD, and stores this information into a MIT cookie file in your home directory called ~/.Xauthority

All we have to do is “move” this information along with us when we change users.  We can use the xauth command to manipulate this for us.  First, lets display what the value of our cookie is.  Note the :10 matching up to our $DISPLAY variable…

cartman@locutus:~$xauth list locutus/unix:10  MIT-MAGIC-COOKIE-1  e2cba22d040f0e75dcbd203ee40736de

Now lets change users..

cartman@locutus:~$su - Password:

root@locutus:~# ls -l /root/.Xauthority ls: /root/.Xauthority: No such file or directory root@locutus:~# xauth list

So no MIT cookies currently exist… That makes sense because we didn’t X11 port forward into the root account.. Lets add one.  Don’t forget the “/unix” after the FQDN.. root@locutus:~# xauth add locutus/unix:10 MIT-MAGIC-COOKIE-1 d203ee40736de0e75dcb

xauth:  creating new authority file /root/.Xauthority

Exceeeelent…. root@locutus:~# xauth list localhost/unix:10  MIT-MAGIC-COOKIE-1  e2cba22d040f0e75dcbd203ee40736de

We’re not done yet… The last thing we have to do is to set our $ DISPLAY variable to the same display as above..  Right now it may be
set to null…

root@locutus:~# echo $DISPLAY

root@locutus:~# xclock Error: Can’t open display:

So lets set it to localhost:10.0

root@locutus:~# export DISPLAY=localhost:10 root@locutus:~# xclock

Sure enough, we get xclock to display.  We didn’t have to be the root user in this example.  Any local user could perform the same function.

Alternativly, xauth also has a “merge” function to where you can read an existing ~/<user/.Xauthority file to merge with another.  This really is only going to work for the root user (unless you chmod) because the permissions on this file is octal 600…

root@locutus:~# ls -l /home/cartman/.Xauthority -rw——- 1 cartman cartman 53 2008-04-05 00:22 /home/cartman/.Xauthority

Lets remove the previous Xauthority we had in place…

root@locutus:~# xauth list locutus/unix:10  MIT-MAGIC-COOKIE-1  e2cba22d040f0e75dcbd203ee40736de root@locutus:~# xauth remove locutus/unix:10

And then we’ll use the merge function pointing at a specific .Xauthority file… root@locutus:~# xauth merge /home/cartman/.Xauthority

Sure enough, it imported correctly..

root@locutus:~# xauth list locutus/unix:10  MIT-MAGIC-COOKIE-1  e2cba22d040f0e75dcbd203ee40736de

Our DISPLAY variable matches the display above and xclock starts up without any errors. root@locutus:~# echo $DISPLAY localhost:10root@locutus:~# xclock

When xauth is invoked without any options, it brings up a menu based configuration utility thats pretty neat… Here’s “xauth info” in action…

root@locutus:~# xauth Using authority file /root/.Xauthority xauth> help     add dpyname protoname hexkey   add entry     exit                           save changes and exit program     extract filename dpyname…    extract entries into file     help [topic]                   print help     info                           print information about entries     list [dpyname…]              list entries     merge filename…              merge entries from files     nextract filename dpyname…   numerically extract entries     nlist [dpyname…]             numerically list entries     nmerge filename…             numerically merge entries     quit                           abort changes and exit program     remove dpyname…              remove entries     source filename                read commands from file     ?                              list available commands     generate dpyname protoname [options]  use server to generate entry     options are:       timeout n    authorization expiration time in seconds       trusted      clients using this entry are trusted       untrusted    clients using this entry are untrusted       group n      clients using this entry belong to application group n       data hexkey  auth protocol specific data needed to generate the entry

xauth> info Author: Matty File new:             no File locked:          no Number of entries:    1 Changes honored:      yes Changes made:         no Current input:        (stdin):2 There are also all sorts of security implecations surrounding ~/.Xauthority where the root user or administrator could hijack X11 sessions.  This articleis a great read and I suggest taking a look at it when you have a chance.  It also goes into better detail on the steps of how the X11 forward occurs and security hazards surrounding it.

A new blogger joins the prefetch family

This article was posted by Matty on 2008-03-31 17:27:00 -0400 -0400

My good friend Mike recently joined the prefetch family, and will be adding additional content to the prefetch blog (his first blog post rocked!). Mike is one of the most skilled UNIX administrators I have ever met, and I am extremely excited that he is going to add his real world experiences to this site! Welcome Mike!

Finding/setting nvalias (nvram) OBP settings from a running Solaris O/S

This article was posted by Matty on 2008-03-21 15:23:00 -0400 -0400

Using the command eeprom (1m) while in the Solaris O/S on SPARC platforms has been a useful way to view and set OBP parameters without bringing the entire machine offline and down to the ok prompt.

Unfortunately, eeprom does not show nvalias definitions. These are most often used to specify root and mirror O/S boot devices. For clarity, these are then plugged into the boot-device and diag-device OBP variables. (diag-device is the OBP variable used to boot the machine when the physical or virtual keyswitch is set to “diag mode.")

Luckly, prtconf -vp will give you this information once you do a little bit of digging…

*$ prtconf -vp*

*....*

*<snip>*

*....*

*Node 0xf022d030
ttya-rts-dtr-off: 'false'
ttya-ignore-cd: 'true'
local-mac-address?: 'true'
fcode-debug?: 'false'
scsi-initiator-id: '7'
oem-logo:
oem-logo?: 'false'
oem-banner:
oem-banner?: 'false'
ansi-terminal?: 'true'
screen-#columns: '80'
screen-#rows: '34'
ttya-mode: '9600,8,n,1,-'
output-device: 'virtual-console'
input-device: 'virtual-console'
auto-boot-on-error?: 'false'
load-base: '16384'
auto-boot?: 'true'
network-boot-arguments:
boot-command: 'boot'
boot-file:
boot-device: 'disk net'
use-nvramrc?: 'false'
nvramrc:
security-mode: 'none'
security-password:
security-#badlogins: '0'
verbosity: 'min'
diag-switch?: 'true'
error-reset-recovery: 'boot'
name: 'options'*

*Node 0xf022d0a8
ttya: '/pci@7c0/pci@0/pci@1/pci@0/isa@2/serial@0,3f8'
nvram: '/virtual-devices/nvram@3'
net3: '/pci@7c0/pci@0/pci@2/network@0,1'
net2: '/pci@7c0/pci@0/pci@2/network@0'
net1: '/pci@780/pci@0/pci@1/network@0,1'
net0: '/pci@780/pci@0/pci@1/network@0'
net: '/pci@780/pci@0/pci@1/network@0'
ide: '/pci@7c0/pci@0/pci@1/pci@0/ide@8'
cdrom: '/pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom@0,0:f'
disk3: '/pci@780/pci@0/pci@9/scsi@0/disk@3'
disk2: '/pci@780/pci@0/pci@9/scsi@0/disk@2'
disk1: '/pci@780/pci@0/pci@9/scsi@0/disk@1'
disk0: '/pci@780/pci@0/pci@9/scsi@0/disk@0'
disk: '/pci@780/pci@0/pci@9/scsi@0/disk@0'
scsi: '/pci@780/pci@0/pci@9/scsi@0'
virtual-console: '/virtual-devices/console@1'
name: 'aliases'

*....*

*<snip>*

*....*

The first thing we need to do is to set the “use-nvramrc = true” OBP variable so our modifications will be used. Viewing parameters with eeprom (1m) is accessible to regular users. Modifying these values requires root privileges.

*$ eeprom use-nvramrc?
use-nvramrc?=false*

Since its false..

*# eeprom use-nvramrc?=true*

To verify..

*$ eeprom use-nvramrc?*

*use-nvramrc?=true*

Awesome. Step 1 down.

Next, we define the device we want to use. First, figure out what your root and mirror devices are.

*$ df -h /
Filesystem size used avail capacity Mounted on
/dev/md/dsk/d10 32G 22G 9.4G 70% /bin /boot /cdrom /core /dev /esx /etc /hom /home /initrd.img /lib /lib32 /lib64 /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz

*$ metastat -p d10
d10 -m d11 d12 1
d11 1 1 c0t0d0s0
d12 1 1 c0t1d0s0*

So we’ve got an encapsulated SVM root file system. Lets find the device paths for c0t0d0 and c0t1d0 under the /devices name space.

*$ ls -l /dev/dsk/c0t0d0s0
lrwxrwxrwx 1 root root 49 Feb 6 11:06 /dev/dsk/c0t0d0s0 ->
../../devices/pci@780/pci@0/pci@9/scsi@0/sd@0,0:a*

*$ ls -l /dev/dsk/c0t1d0s0
lrwxrwxrwx 1 root root 49 Jan 11 2007 /dev/dsk/c0t1d0s0 ->
../../devices/pci@780/pci@0/pci@9/scsi@0/sd@1,0:a*

Ok.. so removing the “/devices” from the path (since that’s just a Solaris name space) and the trailing “a” gives us the following….

/pci@780/pci@0/pci@9/scsi@0/sd@0,0

/pci@780/pci@0/pci@9/scsi@0/sd@1,0

(Side note… the “a” stands for slice 0. Slice 1 would have a “b”, slice 2 would have a “c”, etc. You can see an example of this by issuing a $ ls -l /dev/dsk/c0t0d0s1. That’s why when you boot off of cdrom, you’ll see a trailing “f”. By default, its looking for boot strap information on slice 6!)

We can confirm this by grepping for the two above paths into /etc/path_to_inst

*$ grep '/pci@780/pci@0/pci@9/scsi@0/sd@[0-1],0' /etc/path_to_inst
"/pci@780/pci@0/pci@9/scsi@0/sd@0,0" 1 "sd"
"/pci@780/pci@0/pci@9/scsi@0/sd@1,0" 3 "sd"*

Sweet. Step 2 down.

Next, lets assign an alias of “rootdisk” to the first device and “rootmirror” to the second. Replace the two characters “sd” with the word “disk”.

*# eeprom nvramrc="devalias rootdisk
/pci@780/pci@0/pci@9/scsi@0/disk@0,0 devalias rootmirror
/pci@780/pci@0/pci@9/scsi@0/disk@1,0"*

While we’re at it, lets change our boot device to point to rootdisk and rootmirror.

*# eeprom boot-device="rootdisk rootmirror"*
The OBP variable nvramrc a placeholder for values that haven't been
committed into the NVRAM. Upon the next reboot, the nvaliasrc variable
will be commited into NVRAM. Lets see this in action. The whole point of
this article is so we didn't have to bounce the machine, but just to
prove this is how it works lets see it anyways.

*# halt
syncing file systems... done
Program terminated
{0} ok*

*{0} ok printenv nvramrc
nvramrc = devalias rootdisk /pci@780/pci@0/pci@9/scsi@0/disk@0,0
devalias rootmirror /pci@780/pci@0/pci@9/scsi@0/disk@1,0*

So there’s our value that we stuck into nvramrc.

Our modifications haven’t appeared under devalias yet because a complete boot cycle is needed before the contents of nvramrc gets committed into NVRAM…

*{0} ok devalias
ttya /pci@7c0/pci@0/pci@1/pci@0/isa@2/serial@0,3f8
nvram /virtual-devices/nvram@3
net3 /pci@7c0/pci@0/pci@2/network@0,1
net2 /pci@7c0/pci@0/pci@2/network@0
net1 /pci@780/pci@0/pci@1/network@0,1
net0 /pci@780/pci@0/pci@1/network@0
net /pci@780/pci@0/pci@1/network@0
ide /pci@7c0/pci@0/pci@1/pci@0/ide@8
cdrom /pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom@0,0:f
disk3 /pci@780/pci@0/pci@9/scsi@0/disk@3
disk2 /pci@780/pci@0/pci@9/scsi@0/disk@2
disk1 /pci@780/pci@0/pci@9/scsi@0/disk@1
disk0 /pci@780/pci@0/pci@9/scsi@0/disk@0
disk /pci@780/pci@0/pci@9/scsi@0/disk@0
scsi /pci@780/pci@0/pci@9/scsi@0
virtual-console /virtual-devices/console@1
name aliases*

Lets bounce the box so the contents of nvramrc are committed into NVRAM. (

*{0} ok boot rootdisk

*SC Alert: Host System has Reset*

*SC Alert: Host system has shut down.
..

*...*

*Sun Fire T200, No Keyboard
Copyright 2006 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.25.0, 32760 MB memory available, Serial #XXXXXXX
Ethernet address 0:14:4f:xx:xx:xx, Host ID: xxxxxxxx*

***Rebooting with command: boot rootdisk**
Boot device: **/pci@780/pci@0/pci@9/scsi@0/disk@0,0** File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T200/ufsboot
Loading: /platform/sun4v/ufsboot
SunOS Release 5.10 Version Generic_127111-05 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.*
Sure enough, once the box is back online, prtconf -vp found our
modifications committed into NVRAM.

*$ prtconf -vp*

*...*

*<snip>*

*...*

*Node 0xf022d0a8
**rootmirror: '/pci@780/pci@0/pci@9/scsi@0/disk@1'
rootdisk: '/pci@780/pci@0/pci@9/scsi@0/disk@0'**
ttya: '/pci@7c0/pci@0/pci@1/pci@0/isa@2/serial@0,3f8'
nvram: '/virtual-devices/nvram@3'
net3: '/pci@7c0/pci@0/pci@2/network@0,1'
net2: '/pci@7c0/pci@0/pci@2/network@0'
net1: '/pci@780/pci@0/pci@1/network@0,1'
net0: '/pci@780/pci@0/pci@1/network@0'
net: '/pci@780/pci@0/pci@1/network@0'
ide: '/pci@7c0/pci@0/pci@1/pci@0/ide@8'
cdrom: '/pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom@0,0:f'
disk3: '/pci@780/pci@0/pci@9/scsi@0/disk@3'
disk2: '/pci@780/pci@0/pci@9/scsi@0/disk@2'
disk1: '/pci@780/pci@0/pci@9/scsi@0/disk@1'
disk0: '/pci@780/pci@0/pci@9/scsi@0/disk@0'
disk: '/pci@780/pci@0/pci@9/scsi@0/disk@0'
scsi: '/pci@780/pci@0/pci@9/scsi@0'
virtual-console: '/virtual-devices/console@1'
name: 'aliases'*

If you follow this procedure closely, you don’t have to bounce the box to make this modification – but keep in mind that messing around with these OBP variables especially boot-device without testing can leave your machine in a state for some other poor administrator to figure out which disk is your real boot-device. I take no responsibility for what you do with your OBP modifications. =)

Configuring ZFS to gracefully deal with pool failures

This article was posted by Matty on 2008-03-01 12:05:00 -0400 -0400

If you are running ZFS in production, you may have experienced a situation where your server paniced and reboot when a ZFS file system was corrupted. With George Wilson’s recent putback of CR #6322646, this is no longer the case. George’s putback allows the file system administrator to set the “failmode” property to control that happens when a pool incurs a fault. Here is a description of the new property from the zpool(1m) manual page:

failmode=wait | continue | panic

Controls the system behavior in the event of catas- trophic pool failure. This condition is typically a result of a loss of connectivity to the underlying storage device(s) or a failure of all devices within the pool. The behavior of such an event is determined as follows:

wait Blocks all I/O access until the device con- nectivity is recovered and the errors are cleared. This is the default behavior.

continue Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked.

panic Prints out a message to the console and gen- erates a system crash dump.

To see just how well this feature worked, I decided to test out the new failmode property. To begin my tests, I created a new ZFS pool from two files:

$ cd / && mkfile 1g file1 file2

$ zpool create p1 /file1 /file2

$ zpool status

pool: p1
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
p1 ONLINE 0 0 0
/file1 ONLINE 0 0 0
/file2 ONLINE 0 0 0

After the pool was created, I checked the failmode property:

$ zpool get failmode p1

NAME PROPERTY VALUE SOURCE
p1 failmode wait default

And then then began writing garbage to one of the files to see what would happen:

$ dd if=/dev/zero of=/file1 bs=512 count=1024

$ zpool scrub p1

I was overjoyed to find that the box was still running, even though the pool showed up as faulted:

$ zpool status

pool: p1
state: FAULTED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed after 0h0m with 0 errors on Tue Feb 19 13:57:41 2008
config:

NAME STATE READ WRITE CKSUM
p1 FAULTED 0 0 0 insufficient replicas
/file1 UNAVAIL 0 0 0 corrupted data
/file2 ONLINE 0 0 0

errors: No known data errors

But my joy didn’t last long, since the box became unresponsive after a few minutes, and paniced with the following string:

Feb 19 13:57:47 nevadadev genunix: [ID 603766 kern.notice] assertion failed: vdev_config_sync(rvd->vdev_child, rvd->vdev_children, txg) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/spa.c, line: 4130
Feb 19 13:57:47 nevadadev unix: [ID 100000 kern.notice]
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feab30 genunix:assfail3+b9 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feabd0 zfs:spa_sync+5d2 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac60 zfs:txg_sync_thread+19a ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac70 unix:thread_start+8 ()

Since the manual page states that the failmode property “controls the system behavior in the event of catas-trophic pool failure,” it appears the box should have stayed up and operational when the pool became unusable. I filed a bug on the opensolaris website, so hopefully the ZFS team will get this issue addressed in the future.

Creating ZFS file systems during the jumpstart process

This article was posted by Matty on 2008-02-25 00:35:00 -0400 -0400

I use jumpstart at home to update the hosts in my lab as new Nevada builds and Solaris updates are released. As part of the unattended installation / upgrade process, I create a couple of ZFS file systems on each system. Since jumpstart doesn’t have built-in support for creating ZFS file system, I had to add the zpool and zfs commands to my finish script. After a bit of tinkering around, here is what I came up with:

# Locate the first local device on the system
DISK1=`echo quit | /usr/sbin/format 2>/dev/null | /usr/bin/awk '$0 ~ /0./ { print $2 }'`

# Create a ZFS pool using the disk device above
/usr/sbin/zpool create -f -R /a p0 ${DISK1}

# Create ZFS file systems
/usr/sbin/zfs create p0/home
/usr/sbin/zfs set mountpoint=/home p0/home
.....

This appears to work pretty well, and my boxes are now built and operational once the jumpstart process completes. Niiiiiiiice!