The Linux Logical Volume Manager (LVM) provides a relatively easy way to combine block devices into a pool of storage that you can allocate storage out of. In LVM terminology, there are three main concepts:
When you use LVM to manage your storage, you will typically do something similar to this when new storage requests are made:
With this approach you can end up with free space in one or more physical volumes or one or more volume groups depending on how you provisioned the storage. To see how much free space your physical volumes have you can run the pvs utility without any arguments:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 VolGroup lvm2 a-- 8.51g 0
/dev/sdb DataVG lvm2 a-- 18.00g 18.00g
/dev/sdc DataVG lvm2 a-- 18.00g 184.00m
The “PFree” column shows the free space for each physical volume in the system. To see how much free space your volume groups have you can run the vgs utility without any arguments:
$ vgs
VG #PV #LV #SN Attr VSize VFree
DataVG 2 1 0 wz--n- 35.99g 18.18g
VolGroup 1 2 0 wz--n- 8.51g 0
In the vgs output the “VFree” column shows the amount of free space in each volume group. LVM is nice, but I’m definitely a ZFS fan when it comes to storage management. I’m hopeful that Oracle will come around and port ZFS to Linux, since it would benefit a lot of users and hopefully help to repair some of the broken relations between Oracle and the opensource community. I may be too much of an optimist though.
I’ve been a long time follower of the OpenBSD project, and their amazing work on detecting and protecting the kernel and applications from stack and heap overflows. Several of the concepts that were developed by the OpenBSD team were made available in Linux, and came by way of the exec-shield project. Of the many useful security features that are part of exec-shield, the two features that can be controlled by a SysAdmin are kernel virtual address space randomizations and the exec-shield operating mode.
Address space randomization are controlled through the kernel.randomize_va_space sysctl tunable, which defaults to 1 on my CentOS systems:
$ sysctl kernel.randomize_va_space
kernel.randomize_va_space = 1
The exec-shield operating mode is controlled through the kernel.exec-shield sysctl value, and can be set to one of the following four modes (the descriptions below came from Steve Grubb’s excellent post on exec-shield operating modes):
A value of 0 completely disables ExecShield and Address Space Layout Randomization
A value of 1 enables them ONLY if the application bits for these protections are set to “enable”
A value of 2 enables them by default, except if the application bits are set to “disable”
A value of 3 enables them always, whatever the application bits
The default exec-shield value on my CentoOS servers is 1, which enables exec-shield for applications that have been compiled to support it:
$ sysctl kernel.exec-shield
kernel.exec-shield = 1
To view the list of running processes that have exec-shield enabled, you can run Ingo Molnar and Ulrich Drepper’s lsexec utility:
$ lsexec --all |more
init, PID 1, UID root: no PIE, no RELRO, execshield enabled
httpd, PID 11689, UID apache: DSO, no RELRO, execshield enabled
httpd, PID 11691, UID apache: DSO, no RELRO, execshield enabled
httpd, PID 11692, UID apache: DSO, no RELRO, execshield enabled
httpd, PID 11693, UID apache: DSO, no RELRO, execshield enabled
httpd, PID 12224, UID apache: DSO, no RELRO, execshield enabled
httpd, PID 12236, UID apache: DSO, no RELRO, execshield enabled
pickup, PID 16181, UID postfix: DSO, partial RELRO, execshield enabled
appLoader, PID 2347, UID root: no PIE, no RELRO, execshield enabled
auditd, PID 2606, UID root: DSO, partial RELRO, execshield enabled
audispd, PID 2608, UID root: DSO, partial RELRO, execshield enabled
restorecond, PID 2629, UID root: DSO, partial RELRO, execshield enabled
In this day and age of continuos security threats there is little to no reason that you shouldn’t be using these amazing technologies. When you combine exec-shield, SELinux and proper patching and security best practices you can really limit the attack vectors that can be used to break into your systems.
I’ve been looking at some opensource scheduling packages, and while doing my research I came across the fcron package. Fcron is a replacement for vixie cron and anacron, and provides a number of super useful features:
My initial testing has been positive, and I definitely plan to keep this package in my back pocket. I’m still looking at various opensource schedulers, and if you have any experience in this area please leave me a comment. I’m curious which solutions worked well for my readers. :)
I talked about the ZFS scrub feature a few months back. In the latest Solaris 10 update the developers added additional scrub statistics, which are quite handy for figuring out throughout and estimated completion times:
$ zpool scrub rpool
$ zpool status -v
pool: rpool
state: ONLINE
scan: scrub in progress since Tue Dec 6 07:45:31 2011
1005M scanned out of 81.0G at 29.5M/s, 0h46m to go
1005M scanned out of 81.0G at 29.5M/s, 0h46m to go
0 repaired, 1.21% done
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
errors: No known data errors
This sure beats the previous output! Nice job team Solaris.
In a previous post I talked about my problems getting gluster to expand the number of replicas in a volume. While experimenting with the gluster utilities “add-brick” option I wanted to see if adding two more bricks would replicate the existing data across four bricks (two old, two new), or if the two new bricks would be a replica pair and the two previous bricks would be a replica pair. To see what would happen I added two more bricks:
$ gluster volume add-brick glustervol01
centos-cluster01.homefetch.net:/gluster/vol01
centos-cluster02.homefetch.net:/gluster/vol01**
Add Brick successful
And then checked out the status of the volume:
$ gluster volume info glustervol01
Volume Name: glustervol01
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: fedora-cluster01.homefetch.net:/gluster/vol01
Brick2: fedora-cluster02.homefetch.net:/gluster/vol01
Brick3: centos-cluster01.homefetch.net:/gluster/vol01
Brick4: centos-cluster02.homefetch.net:/gluster/vol01
Interesting. The volume is now a distributed-replicated volume, and has a two by two configuration giving four nodes in total. This configuration is similar to RAID 10, where you stripe across mirrors. The previous two nodes would be one mirror, and the two new nodes would become the second mirror. I confirmed this by copying files to my gluster file system and then checking the bricks to see where the files landed:
$ cd /gluster
$ cp /etc/services file1
$ cp /etc/services file2
$ cp /etc/services file3
$ cp /etc/services file4
$ ls -la
total 2648
drwxr-xr-x 4 root root 8192 Nov 27 2011 .
dr-xr-xr-x. 23 root root 4096 Nov 12 15:44 ..
drwxr-xr-x 2 root root 16384 Nov 27 2011 etc1
-rw-r--r-- 1 root root 656517 Nov 27 2011 file1
-rw-r--r-- 1 root root 656517 Nov 27 2011 file2
-rw-r--r-- 1 root root 656517 Nov 27 2011 file3
-rw-r--r-- 1 root root 656517 Nov 27 2011 file4
drwx------ 2 root root 20480 Nov 26 21:11 lost+found
Four files were copied to the gluster file system, and it looks like two landed on each replicated pair of bricks. Here is the ls listing from the first pair (I pulled this from one of the two nodes):
$ ls -la
total 1328
drwxr-xr-x. 4 root root 4096 Nov 27 10:00 .
drwxr-xr-x. 3 root root 4096 Nov 26 17:53 ..
drwxr-xr-x. 2 root root 4096 Nov 27 10:00 etc1
-rw-r--r--. 1 root root 656517 Nov 27 10:00 file1
-rw-r--r--. 1 root root 656517 Nov 27 10:01 file2
drwx------. 2 root root 16384 Nov 26 21:11 lost+found
And here is the listing from the second replicated pair of bricks:
$ ls -la
total 1324
drwxr-xr-x 4 root root 4096 Nov 27 10:00 .
drwxr-xr-x 3 root root 4096 Nov 12 20:05 ..
drwxr-xr-x 126 root root 12288 Nov 27 10:00 etc1
-rw-r--r-- 1 root root 656517 Nov 27 10:00 file3
-rw-r--r-- 1 root root 656517 Nov 27 10:00 file4
drwx------ 2 root root 4096 Nov 26 21:11 lost+found
So there you have it. Adding two more bricks with “add-brick” adds a new pair of replicated bricks, it doesn’t mirror the data between the old bricks and the new ones. Given the description of a distributed replicated volume in the official documentation this makes total sense. Now to play around with some of the other redundancy types.