Finding approximate matches in a data file with agrep

A few weeks back I ran into a situation that required me to locate a data given a file with various variations of that data. I proceeded to grep for each form of the string (e.g., “teh”, “the”, “tte”) I could think of, but wasn’t getting the results I wanted. After a bit of poking around, I came across the incredibly useful agrep utility. This utility allows you to look for approximate matches in files, specifying the number of variations that can occur. If you were given a file with various variations of the string “the”:

$ cat input.txt
teh
the
tte
thw

You could locate each string by running agrep with the string you want to look for and a variation of 1:

$ agrep -1 the input.txt
teh
the
tte
thw

This is a useful utility and one I hope my fellow SysAdmins enjoy. Hope everyone had a merry Christmas!

1 thought on “Finding approximate matches in a data file with agrep”

  1. agrep is a wonderful tool! I have also been digging around for solution in a similar task and I found your article here(I am reading your blog from 2 years or so). Thank you very much! Before reading this my only guess was represent by one scary sed/grep command or even scarier perl script in order to deal with it. The sed/grep variant was too clumsy and hard for modifications and perl…let’s say that my perl skills are not so advanced to finish the task in timely matter :)
    Sorry for my English – I should study and practice it more but I believe it is understandable at least :)

Leave a Reply

Your email address will not be published. Required fields are marked *