I am a huge fan of awk, and find myself constantly using it to parse simple data steams. awk contains numerous string-related functions, which are an invaluable resource for awk script developers. To illustrate some of the cool awk functions, I created a single-line text file with the string “String of string”:
$ cat text
string of string
To get the length of each line in the file text, the length() function can be used:
$ awk '{ i = length(0); print i }' text
16
To see if the string “ing” is present in the file text, the index() function can be used:
$ awk '{ i = index(0,"ing"); print i}' text
4
index() will return the location of the first occurence of “ing,” which can then be used to facilitate further string processing. To retrieve a range of characters in a string, a beginning and ending offset can be passed to the awk substr() function:
$ awk '{ i = substr(0,5,10); print i }' text
ng of stri
And finally, to tokenize (split a line into word-length pieces) a string, the split() function can be used:
$ awk 'BEGIN { i = 1 } { n = split(0,a," "); while (i <= n) {print a[i]; i++;} }' text
string
of
string
I dig awk!