Perl and greediness


When a regular expression uses the ‘*’ wild card operator to match text, the regular expression will attempt to match as much as possible when applying the regular expression. Given the the following Perl code with the regular expression “(Some.*text)":

$ cat test.pl

#!/usr/bin/perl

my $string = "Some chunk of text that has text";

$string =~ /(Some.text)(.)/;`   
print "One: $1nTwo: $2n";

We see that by default Perl will match from the word “Some” to the right-most word “text.":

$ test.pl
One: Some chunk of text that has text Two:

In regular expression parlance, this is considered a “greedy” regular expression since it attempts to match as much as possible. This is not ideal in most situations, and is easily fixed with Perl’s ‘?’ operator:

$ cat test.pl

#!/usr/bin/perl

my $string = "Some chunk of text that has text";

$string =~ /(Some.?text)(.)/;`   
print "One: $1nTwo: $2n";

$ test.pl
One: Some chunk of text Two: that has text

In this example Perl will no longer become greedy when evaluating the expression, and will attempt to match up to the left-most occurence of the string element prefaced by ‘?'. Regular expressions are amazingly cool, but sometimes it feels like witchcraft when developing the right regex incantation to solve complex problems.

This article was posted by Matty on 2005-09-19 21:49:00 -0400 -0400