I previously discussed the OpenSSH Match directive, and how it can be used to chroot SSH and SFTP users. Over the past couple of months I’ve encountered some gotchas with the chroot implementation in OpenSSH. Since I had to figure these items out myself, I figured I would share my findings here so folks wouldn’t need to spend hours looking at source code (if you want to geek out and see how this works, check out session.c in the OpenSSH source code).
The first gotcha occurs when the users home directory doesn’t have the correct permissions. The directory the user is chroot()‘ed into needs to be root owned, and in order for the user to see the contents of the top level directory the group permissions need to be read/execute. A user that is going to be chroot()‘ed into /chroot/user will need the following permissions:
$ mkdir /chroot/user
$ chmod 750 /chroot/user
$ chown root:user /chroot/user
If the permissions aren’t set 100% correctly you will be immediately disconnected:
$ sftp -o Port=222 user@192.168.1.25
Connecting to 192.168.1.25...
user@192.168.1.25's password:
Read from remote host 192.168.1.25: Connection reset by peer
Connection closed
And the following error will be written to your system logs:
Apr 23 14:06:27 vern sshd[13714]: fatal: bad ownership or modes for chroot directory "/chroot/user"
There is also an issue with directory write-ability. Users will not be able to write to the top-level directory of the chroot() due to the strong permissions that are required. If you want your user to be able to create files and directories they will need to change into a directory off of “/” prior to doing so. If you want to place the user into this directory you can replace the home directory field in /etc/passwd with this directory:
$ grep user /etc/passwd /etc/passwd-
user:x:502:502::/home:/bin/bash
In this example user will be chroot’ed into /chroot/user, then a chdir() will occur and the user will be placed into the directory home:
$ sftp -o Port=222 user@192.168.1.25
Connecting to 192.168.1.25...
user@192.168.1.25's password:
sftp> pwd
Remote working directory: /home
With a little understanding the OpenSSH chroot() implementation can definitely be made quite usable.
If you are using SSH key-based authentication you should be encrypting your private key. This ensures that if someone breaks into your server and steals your keys, they won’t be able to utilize them to access other systems. If your private key isn’t encrypted you can use the ssh-keygen utilities “-p” option to do so:
$ ssh-keygen -p -f id_dsa
Enter old passphrase:
Key has comment 'id_dsa'
Enter new passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved with the new passphrase.
This option can be used to change the password used to encrypt a private key, and to add a password to an existing private key. Viva la OpenSSH!
The devops movement (if you haven’t seen Ben Rockwood’s presentation on devops you should go watch it now) has been gaining steam over the past few years, and the movement has led to a lot of organizations adopting automation solutions like CFEngine, Chef or Puppet. I’ve had great success with puppet so far, and my fellow blogging partner Mike has had similar success with CFEngine. Maybe Mike will come out of hibernation and give everyone an update on the cool things he’s done with it. :)
When I first started using Puppet I purchased a copy of James Turnbull’s Pulling Strings with Puppet. I’m your atypical learner who likes to sit in a recliner with a hard cover book and read it from beginning to end. Then I will go back through each chapter and experiment with the items I highlighted. This book had A LOT of material to experiment with, and is broken up into seven chapters:
Chapter 1. Introducing Puppet
Chapter 2. Installing and Running Puppet
Chapter 3. Speaking Puppet
Chapter 4. Using Puppet
Chapter 5. Reporting on Puppet
Chapter 6. Advanced Puppet
Chapter 7. Extending Puppet
The introduction and installation chapters helped me understand the pupper architecture, and the purpose of the various daemons that run on the puppetmaster. I also enjoyed the description of facter, which is one of the most useful software applications I’ve come across (and it’s useful outside of Puppet). The description of the runtime flags was also extremely handy, and it made debugging my initial configuration issues quite easy.
The chapter on using puppet was also well put together. It did a good job of describing classes, resources, inheritance, variables, scoping, arrays, conditionals, nodes, facts and resource types. I’ve referenced this chapter many times since putting together my first set of manifests, and when you combine it with the latest resource type descriptions on the puppet site you will have everything you need to define your resources.
I found the last three chapters useful for seeing how to deploy puppet in a real world scenario. It also touched on how to tie a version control system into puppet, which is a must when you are centralizing your configuration management duties. These chapters also touched on external node classifiers, which allow you to retrieve a set of nodes from an external source (LDAP, CMDB, etc.). This becomes essential essential when you are managing thousands of machines, since it becomes a pain to constantly be editing files when devices are added or removed (I’m assuming you have an automated solution to handle node additions and removals from your network).
Pulling Strings with Puppet was a solid book, and I would definitely give it a 5/5. Early on I stumbled trying to figure out how to classify everything, but after a lot of trial and error I’ve finally come up with a layout that does everything I need and is easy to manage. It’s also a piece of cake to extend my configuration as new services come online. I’m hoping to start playing around with chef next month, and plan to do a side-by-side comparison of the two later this year. There are things I like about each solution, so this should be a whole bunch of fun!!
Software fails, and it often occurs at the wrong time. When failures occur I want to understand why, and will usually start putting together the events that lead up to the issue. Some application issues can be root caused by reviewing logs, but catastrophic crashes will often require the admin to sit down with gdb and review a core file if it exists.
Solaris has always led the charge when it comes to reliably creating core files during crashes. System crashes will cause core files to be dumped to /var/crash, and the coreadm utility can be used to save application core files. Linux has been playing catch up in this realm, and just in the past couple of years started providing diskdump and netdump to generate kernel crash dumps when an OOPS occurs (in my experience these tools aren’t as reliable as their Solaris counterparts though).
In the latest releases of Fedora, the automated bug-reporting tool (abrt) infrastructure was added to generate core files when processes crashed. The abrt website provides the following description for their automated crash dump collection infrastructure:
“abrt is a daemon that watches for application crashes. When a crash occurs, it collects the crash data (core file, application’s command line etc.) and takes action according to the type of application that crashed and according to the configuration in the abrt.conf configuration file. There are plugins for various actions: for example to report the crash to Bugzilla, to mail the report, to transfer the report via FTP or SCP, or to run a specified application."*
This is a welcome addition to the Linux family, and it works pretty well from my initial testing. To see abrt in action I created a script and sent it a SIGSEGV:
$ cat loop
#!/bin/bash
while :; do
sleep 1
done
$ ./loop &
[1] 22790
$ kill -SIGSEGV 22790
[1]+ Segmentation fault (core dumped) ./loop
Once the fault occurred I tailed /var/log/messages to get the crash dump location:
$ tail -20 /var/log/messages
Jan 17 09:52:08 theshack abrt[20136]: Saved core dump of pid 20126 (/bin/bash) to /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126 (487424 bytes)
Jan 17 09:52:08 theshack abrtd: Directory 'ccpp-2012-01-17-09:52:08-20126' creation detected
Jan 17 09:52:10 theshack abrtd: New problem directory /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126, processing
If I change into the directory referenced above I will see a wealth of debugging data, including a core file from the application (bash) that crashed:
$ cd /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126
$ ls -la
total 360
drwxr-x---. 2 abrt root 4096 Jan 17 09:52 .
drwxr-xr-x. 4 abrt abrt 4096 Jan 17 09:52 ..
-rw-r-----. 1 abrt root 5 Jan 17 09:52 abrt_version
-rw-r-----. 1 abrt root 4 Jan 17 09:52 analyzer
-rw-r-----. 1 abrt root 6 Jan 17 09:52 architecture
-rw-r-----. 1 abrt root 16 Jan 17 09:52 cmdline
-rw-r-----. 1 abrt root 4 Jan 17 09:52 component
-rw-r-----. 1 abrt root 487424 Jan 17 09:52 coredump
-rw-r-----. 1 abrt root 1 Jan 17 09:52 count
-rw-r-----. 1 abrt root 649 Jan 17 09:52 dso_list
-rw-r-----. 1 abrt root 2110 Jan 17 09:52 environ
-rw-r-----. 1 abrt root 9 Jan 17 09:52 executable
-rw-r-----. 1 abrt root 8 Jan 17 09:52 hostname
-rw-r-----. 1 abrt root 19 Jan 17 09:52 kernel
-rw-r-----. 1 abrt root 2914 Jan 17 09:52 maps
-rw-r-----. 1 abrt root 25 Jan 17 09:52 os_release
-rw-r-----. 1 abrt root 18 Jan 17 09:52 package
-rw-r-----. 1 abrt root 5 Jan 17 09:52 pid
-rw-r-----. 1 abrt root 4 Jan 17 09:52 pwd
-rw-r-----. 1 abrt root 51 Jan 17 09:52 reason
-rw-r-----. 1 abrt root 10 Jan 17 09:52 time
-rw-r-----. 1 abrt root 1 Jan 17 09:52 uid
-rw-r-----. 1 abrt root 5 Jan 17 09:52 username
-rw-r-----. 1 abrt root 40 Jan 17 09:52 uuid
-rw-r-----. 1 abrt root 620 Jan 17 09:52 var_log_messages
If we were debugging the crash we could poke around the saved environment files and then fire up gdb with the core dump to see where it crashed:
$ gdb /bin/bash coredump
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /bin/bash...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New LWP 20126]
Core was generated by `/bin/bash ./loop'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000344aabb83e in waitpid () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bash-4.2.20-1.fc16.x86_64
(gdb) backtrace
#0 0x000000344aabb83e in waitpid () from /lib64/libc.so.6
#1 0x0000000000440679 in ?? ()
#2 0x00000000004417cf in wait_for ()
#3 0x0000000000432736 in execute_command_internal ()
#4 0x0000000000434dae in execute_command ()
#5 0x00000000004351c5 in ?? ()
#6 0x000000000043150b in execute_command_internal ()
#7 0x0000000000434dae in execute_command ()
#8 0x000000000041e0b1 in reader_loop ()
#9 0x000000000041c8ef in main ()
From there you could navigate through the saved memory image to see caused the program to die an unexpected death. Now you may be asking yourself how exactly does abrt work? After digging through the source code I figured out that it installs a custom hook using the core_pattern kernel entry point:
$ cat /proc/sys/kernel/core_pattern |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
Each time a crash occurs the kernel will invoke the hook above passing a number of arguments to the program listed in the core_pattern proc entry. From what I have derived from the source so far there are currently hooks for C/C++ and Python applications, and work is in progress to add support for Java. This is really cool stuff
I recently moved a bind installation from CentOS 5 to CentOS 6. As part of the move I built out a new server with CentOS 6, staged the bind chroot packages and then proceeded to copy all of the zone files from the CentOS 5 server to the CentOS 6 server. Once all the pieces were in place I attempted to start up bind. This failed, and I was greeted with the following error:
$ service named start
Starting named:
Error in named configuration: [FAILED]
There wasn’t anything in /var/log/messages to specifically state what the problem was, though when I reviewed the bind log file I noticed there were several “not loaded due to errors” messages in it:
$ grep "not loaded due to errors" named.log
07-Jan-2012 21:00:03.505 general: error: zone prefetch.net/IN: NS 'ns1.prod.prefetch.net' has no address records (A or AAAA)
07-Jan-2012 21:00:03.505 general: error: zone prefetch/IN: NS 'ns2.prod.prefetch.net' has no address records (A or AAAA)
07-Jan-2012 21:00:03.505 general: error: zone prefetch/IN: not loaded due to errors.
After reviewing the errors I noticed that the problematic zone files (I was not the original author of these) were configured to use forward references to entries in subzone files. This is a no no, and it looks like CentOS 5 bind allows you to use forward references and CentOS 6 bind does not. To allow me to bring up the server while I tracked down all of the offending zone files I set DISABLE_ZONE_CHECKING to yes in /etc/sysconfig/named:
$ grep DISABLE /etc/sysconfig/named
DISABLE_ZONE_CHECKING="yes"
This allowed me to test the server to make sure it worked, and I will get the zone files corrected and run through a zone file consistency check utility in the coming days. If you are moving from CentOS 5 to CentOS 6 you might want to watch out for this (ideally you would already have properly structured zone files!).