Using awk character classes to simplify parsing complex strings


This week I was reading a shell script in a github repository to see if it would be good candidate to automate a task. As I was digging through the code I noticed a lengthy shell pipeline to parse a string similar to this:

Thu Jul 20 18:13:04 EDT 2017 snarble foo bar (gorp): blatch (fmep): gak+

Here is the code they were using to extract the string “gorp”:

$ cat /foo/bar.txt | grep “snarble” | awk ‘{print 10}’ | awk -F'(' ‘{print $2}’ | awk -F’)’ ‘{print $1}’

After my eyes recovered I thought this would be a good candidate to simplify with awk character classes. These are incredibly useful for applying numerous field separators to a given line of input. I took what the original author had and simplified it to this:

$ awk -F'[()]+' '/snarble/ {print 2}' /foo/bar.txt

The argument passed to the field separated option (-F) contains a list of characters to use as delimiters. The string inside the slashes are used to match all lines that contain the word snarble. I find the second a bit easier to read and character classes are a super useful!

This article was posted by Matty on 2017-07-21 08:43:00 -0400 -0400