Using awk character classes to simplify parsing complex strings

This week I was reading a shell script in a github repository to see if it would be good candidate to automate a task. As I was digging through the code I noticed a lengthy shell pipeline to parse a string similar to this:

Thu Jul 20 18:13:04 EDT 2017 snarble foo bar (gorp): blatch (fmep): gak+

Here is the code she/he was using to extract the string “gorp”:

$ cat /foo/bar.txt | grep “snarble” | awk ‘{print $10}’ | awk -F'(‘ ‘{print $2}’ | awk -F’)’ ‘{print $1}’

After my eyes recovered I thought this would be a good candidate to simplify with awk character classes. These are incredibly useful for applying numerous field separators to a given line of input. I took what the original author had and simplified it to this:

$ awk -F'[()]+’ ‘/snarble/ {print $2}’ /foo/bar.txt

The argument passed to the field separated option (-F) contains a list of characters to use as delimiters. The string inside the slashes are used to match all lines that contain the word snarble. I find the second a bit easier to read and character classes are a super useful!

1 thought on “Using awk character classes to simplify parsing complex strings”

  1. Not all awks support regexes to split a string. Personally, I would have used ‘sed’ to extract the string enclosed in parentheses, something like (untested) grep snarble bar.txt | sed -e ‘s/^.*(\([^)]\)):.*$/\1/

Leave a Reply

Your email address will not be published. Required fields are marked *