Prefetch Technologies // Keeping your cache lines cozy

Using awk functions

I am a huge fan of awk, and find myself constantly using it to parse simple data steams. awk contains numerous string-related functions, which are an invaluable resource for awk script developers. To illustrate some of the cool awk functions, I created a single-line text file with the string "String of string":

$ cat text
string of string

To get the length of each line in the file text, the length() function can be used:

$ awk '{ i = length(0); print i }' text
16

To see if the string "ing" is present in the file text, the index() function can be used:

$ awk '{ i = index(0,"ing"); print i}' text
4

index() will return the location of the first occurrence of "ing," which can then be used to facilitate further string processing. To retrieve a range of characters in a string, a beginning and ending offset can be passed to the awk substr() function:

$ awk '{ i = substr(0,5,10); print i }' text
ng of stri

And finally, to tokenize (split a line into word-length pieces) a string, the split() function can be used:

awk 'BEGIN { i = 1 } {
  n = split(0,a," ");
  while (i <= n) {print a[i]; i++;}
}' text
string of string

I dig awk!