Finding approximate matches in a data file with agrep

A few weeks back I ran into a situation that required me to locate a data given a file with various variations of that data. I proceeded to grep for each form of the string (e.g., “teh”, “the”, “tte”) I could think of, but wasn’t getting the results I wanted. After a bit of poking around, I came across the incredibly useful agrep utility. This utility allows you to look for approximate matches in files, specifying the number of variations that can occur. If you were given a file with various variations of the string “the”:

$ cat input.txt

You could locate each string by running agrep with the string you want to look for and a variation of 1:

$ agrep -1 the input.txt

This is a useful utility and one I hope my fellow SysAdmins enjoy. Hope everyone had a merry Christmas!

Sorting data by dates, numbers and much much more

Every year or two I try to re-read manual pages and documentation about my favorite UNIX tools (bash, awk, sed, grep, etc.). Each time I do this I pick up some cool new nugget of information, and refresh my mind on things that I may have forgot. While reading through an article on sort, I came across the following note about the sort “-k” (field to sort by) option:

“Further modifications of the sorting algorithm are possible with these options: -d (use only letters, digits, and blanks for sort keys), -f (turn off case recognition and treat lowercase and uppercase characters as identical), -i (ignores non-printing ASCII characters), -M (sorts lines using three-letter abbreviations of month names: JAN, FEB, MAR, …), -n (sorts lines using only digits, -, and commas, or other thousands separator). These options, as well as -b and -r, can be used as part of a key number, in which case they apply to that key only and not globally, like they do when they are used outside key definitions.”

This is crazy useful, and I didn’t realize sort could be used to sort by date. I put this to use today, when I had to sort a slew of data that looked similar to this:

Jun 10 05:17:47 some_data_string
May 20 05:17:48 some_data_string2
Jun 17 05:17:49 some_data_string0

I was able to first sort by the month, and then by the day of the month:

$ awk ‘{printf “%-3s %-2s %-8s %-50s\n”, $1, $2, $3, $4 }’ data | sort -k1M -k2n
May 17 05:17:49 some_data_string0
Jun 01 05:17:47 some_data_string
Jun 20 05:17:48 some_data_string2

Awesome stuff, and I will definitely be using this again in the future!!!

How to undelete any open, deleted file on linux / solaris

Chris Dew wrote up a neat trick on how to recover files if deleted on Linux, yet still open by a process.

This works on Solaris as well.  =)

$:~:uname -a
SunOS 5.10 Generic_127112-11 i86pc i386 i86pc

$:~:echo “sup folks?” > testfile
$:~:tail -f testfile &
[1] 17134

$:~:rm testfile
$:~:ls /proc/17134/fd/
0  1  2
$:~:cat /proc/17134/fd/0
sup folks?
$:~:cp !$ ./testfile
cp /proc/17134/fd/0 ./testfile
$:~:cat testfile
sup folks?

Helpful shell shortcuts

So this may be a little basic, but I find myself using these two shortcuts quite a bit while at the shell.

If you ever find yourself wanting to “reuse” the last argument in a command — for example, here I move a file from one location into /var/tmp and I want to “cd” into /var/tmp without having to type it, use the shell variable !$…

(svoboda)> dd if=/dev/zero of=/tmp/blah bs=1024000 count=1
1+0 records in
1+0 records out
1024000 bytes (1.0 MB) copied, 0.0109023 s, 93.9 MB/s

(svoboda)> mv /tmp/blah /var/tmp

(svoboda)> cd !$
cd /var/tmp

(svoboda)> pwd

If you wanted to “preface” your last command, you can throw anything you want into the shell followed by the !! shell shortcut.
(svoboda)> “Armin van Buuren’s a State of Trance”
-bash: Armin van Buuren’s a State of Trance: command not found

(svoboda)> echo !!
echo “Armin van Buuren’s a State of Trance”
Armin van Buuren’s a State of Trance

The first line mearly “shows” what is being executed, with the second line executing the actual command.  Not rocket science, but whatever helps on saving keystrokes!

Deciphering shell exit codes

I was recently debugging an issue with a shell script, and noticed that the shell was exiting with an exit code greater than 100 when it received a SIGTSTP signal:

$ cat test

sleep 60

# Window one
$ ./test
[1]+ Stopped ./test
Home:~ matty$ echo $?

# Window two
$ kill -18 4667

I was curious where the exit value of 146 came from, so I did a bit of digging. It turns out that when a shell exits due to an uncaught signal, the signal number is added to 128 and that is the value that is returned. So in the case above, the exit code 146 was returned. I digs me some random shell knowledge.

Bash’s built in commands

If you’re a frequent user of the bash shell, I would suggest taking a peek at the GNU reference guide next time you have a chance.  There are a lot of cool built in functions/commands within bash that are pretty neat.  To get an idea of what these built in commands are:

$ ps
15997 pts/2    00:00:00 bash
23625 pts/2    00:00:00 ps

$ which enable
/usr/bin/which: no enable in (/bin:/usr/bin:/usr/sbin:/sbin)

$ enable
enable .
enable :
enable [
enable alias
enable bg
enable bind
enable break
enable builtin
enable caller
enable cd
enable command
enable compgen
enable complete
enable continue
enable declare
enable dirs
enable disown
enable echo
enable enable
enable eval
enable exec
enable exit
enable export
enable false
enable fc
enable fg
enable getopts
enable hash
enable help
enable history
enable jobs
enable kill
enable let
enable local
enable logout
enable popd
enable printf
enable pushd
enable pwd
enable read
enable readonly
enable return
enable set
enable shift
enable shopt
enable source
enable suspend
enable test
enable times
enable trap
enable true
enable type
enable typeset
enable ulimit
enable umask
enable unalias
enable unset
enable wait

Some of these do have binaries within the /usr or /bin namespace, while others do not.  Bash’s internal built in definition of these commands is what actually gets executed…

$ which cd
/usr/bin/which: no cd in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which echo
$ which eval
/usr/bin/which: no eval in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which exit
/usr/bin/which: no exit in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which kill

Some of these are pretty intuitive why they should be built-in commands (cd, exit, etc.)  Looking through the GNU reference guide describes what all of these do.  There is also a built-in called “help” which describes some of this without going to the GNU reference guide.

$ which help
/usr/bin/which: no help in (/bin:/usr/bin:/usr/sbin:/sbin)
$ help cd
cd: cd [-L|-P] [dir]
Change the current directory to DIR.  The variable $HOME is the
default DIR.  The variable CDPATH defines the search path for
the directory containing DIR.  Alternative directory names in CDPATH
are separated by a colon (:).  A null directory name is the same as
the current directory, i.e. `.’.  If DIR begins with a slash (/),
then CDPATH is not used.  If the directory is not found, and the
shell option `cdable_vars’ is set, then try the word as a variable
name.  If that variable has a value, then cd to the value of that
variable.  The -P option says to use the physical directory structure
instead of following symbolic links; the -L option forces symbolic links
to be followed.
I found the “bind” built in to be super useful.  Bash has shortcuts that allow you to move the cursor around, search history, etc. and I always find myself forgetting what the keyboard shortcut / sequence is.  When this happens, I just use the “bind” (not DNS!) built in to find what I’m looking for…

$ bind -p

Displays all “bindings” of keyboard shortcuts to variables.   Here, I want to see what keyboard shortcut I can use to search through my command history to find a command…

$ bind -p | grep reverse-search-history
“\C-r”: reverse-search-history

Ah, so its Ctrl-r..  Sure enough, hitting ctrl-r at a command prompt brings up the reverse search prompt through my history and I can look for any command…  Looking for the command “cat” seems to have found me looking through /etc/passwd.


(reverse-i-search)`cat’: cat /etc/passwd

Hitting ctrl-r multiple times once command is found will continue to search backwards through the history.

(reverse-i-search)`cat’: cat /proc/kallsyms | more

Reverse history search is pretty useful, but there are so many other neat features of bash that I find invaluable.  What if I have created some super long command with multiple pipes and I want to move the cursor to the beginning of the line?

$ bind -p | grep beginning-of-line
“\C-a”: beginning-of-line
“\eOH”: beginning-of-line
“\e[1~”: beginning-of-line
“\e[H”: beginning-of-line

Well, it looks like there are a few defined shortcuts, but ctrl-a seems to be a winner.  Sure enough..

$ cat /etc/passwd | awk -F: ‘{print $7}’ | sed ‘s,bin,mike,'[]

$ [c]at /etc/passwd | awk -F: ‘{print $7}’ | sed ‘s,bin,mike,’

Take a peek!  You might find yourself saving some keystrokes.

$ help bind
bind: bind [-lpvsPVS] [-m keymap] [-f filename] [-q name] [-u name] [-r keyseq] [-x keyseq:shell-command] [keyseq:readline-function or readline-command]
Bind a key sequence to a Readline function or a macro, or set
a Readline variable.  The non-option argument syntax is equivalent
to that found in ~/.inputrc, but must be passed as a single argument:
bind ‘”\C-x\C-r”: re-read-init-file’.
bind accepts the following options:
-m  keymap         Use `keymap’ as the keymap for the duration of this
command.  Acceptable keymap names are emacs,
emacs-standard, emacs-meta, emacs-ctlx, vi, vi-move,
vi-command, and vi-insert.
-l                 List names of functions.
-P                 List function names and bindings.
-p                 List functions and bindings in a form that can be
reused as input.
-r  keyseq         Remove the binding for KEYSEQ.
-x  keyseq:shell-command  Cause SHELL-COMMAND to be executed when
KEYSEQ is entered.
-f  filename       Read key bindings from FILENAME.
-q  function-name  Query about which keys invoke the named function.
-u  function-name  Unbind all keys which are bound to the named function.
-V                 List variable names and values
-v                 List variable names and values in a form that can
be reused as input.
-S                 List key sequences that invoke macros and their values
-s                 List key sequences that invoke macros and their values
in a form that can be reused as input.

Want to see if your current shell has history enabled, or some other key bash features?  The GNU Bash Reference guide has details on what all of these variables are.. Like checkwinsize.

$ shopt
cdable_vars     off
cdspell         off
checkhash       off
checkwinsize    on
cmdhist         on
dotglob         off
execfail        off
expand_aliases  on
extdebug        off
extglob         off
extquote        on
failglob        off
force_fignore   on
gnu_errfmt      off
histappend      off
histreedit      off
histverify      off
hostcomplete    on
huponexit       off
interactive_comments    on
lithist         off
login_shell     on
mailwarn        off
no_empty_cmd_completion off
nocaseglob      off
nocasematch     off
nullglob        off
progcomp        on
promptvars      on
restricted_shell        off
shift_verbose   off
sourcepath      on
xpg_echo        off
$ help shopt
shopt: shopt [-pqsu] [-o long-option] optname [optname…]
Toggle the values of variables controlling optional behavior.
The -s flag means to enable (set) each OPTNAME; the -u flag
unsets each OPTNAME.  The -q flag suppresses output; the exit
status indicates whether each OPTNAME is set or unset.  The -o
option restricts the OPTNAMEs to those defined for use with
`set -o’.  With no options, or with the -p option, a list of all
settable options is displayed, with an indication of whether or
not each is set.

If set, Bash checks the window size after each command and, if
necessary, updates the values of LINES and COLUMNS.

Ah nice.  My shell wont send a SIGHUP to jobs while i’m logging out of an interactive shell.  =)