Sorting data by dates, numbers and much much more


Every year or two I try to re-read manual pages and documentation about my favorite UNIX tools (bash, awk, sed, grep, etc.). Each time I do this I pick up some cool new nugget of information, and refresh my mind on things that I may have forgot. While reading through an article on sort, I came across the following note about the sort “-k” (field to sort by) option:

“Further modifications of the sorting algorithm are possible with these options: -d (use only letters, digits, and blanks for sort keys), -f (turn off case recognition and treat lowercase and uppercase characters as identical), -i (ignores non-printing ASCII characters), -M (sorts lines using three-letter abbreviations of month names: JAN, FEB, MAR, …), -n (sorts lines using only digits, -, and commas, or other thousands separator). These options, as well as -b and -r, can be used as part of a key number, in which case they apply to that key only and not globally, like they do when they are used outside key definitions."*

This is crazy useful, and I didn’t realize sort could be used to sort by date. I put this to use today, when I had to sort a slew of data that looked similar to this:

Jun 10 05:17:47 some_data_string
May 20 05:17:48 some_data_string2
Jun 17 05:17:49 some_data_string0

I was able to first sort by the month, and then by the day of the month:

$ awk '{printf "%-3s %-2s %-8s %-50s ", 1, 2, 3, 4 }' data | sort -k1M -k2n

May 17 05:17:49 some_data_string0
Jun 01 05:17:47 some_data_string
Jun 20 05:17:48 some_data_string2

Awesome stuff, and I will definitely be using this again in the future!!!

This article was posted by Matty on 2010-06-24 08:56:00 -0400 -0400