Finding approximate matches in a data file with agrep


A few weeks back I ran into a situation that required me to locate a data given a file with various variations of that data. I proceeded to grep for each form of the string (e.g., “teh”, “the”, “tte”) I could think of, but wasn’t getting the results I wanted. After a bit of poking around, I came across the incredibly useful agrep utility. This utility allows you to look for approximate matches in files, specifying the number of variations that can occur. If you were given a file with various variations of the string “the”:

$ cat input.txt

teh
the
tte
thw

You could locate each string by running agrep with the string you want to look for and a variation of 1:

$ agrep -1 the input.txt

teh
the
tte
thw

This is a useful utility and one I hope my fellow SysAdmins enjoy. Hope everyone had a merry Christmas!

This article was posted by Matty on 2010-12-27 12:34:00 -0400 -0400