When a regular expression uses the ‘*’ wild card operator to match text, the regular expression will attempt to match as much as possible when applying the regular expression. Given the the following Perl code with the regular expression “(Some.*text)":
$ cat test.pl
#!/usr/bin/perl
my $string = "Some chunk of text that has text";
$string =~ /(Some.text)(.)/;`
print "One: $1nTwo: $2n";
We see that by default Perl will match from the word “Some” to the right-most word “text.":
$ test.pl
One: Some chunk of text that has text
Two:
In regular expression parlance, this is considered a “greedy” regular expression since it attempts to match as much as possible. This is not ideal in most situations, and is easily fixed with Perl’s ‘?’ operator:
$ cat test.pl
#!/usr/bin/perl
my $string = "Some chunk of text that has text";
$string =~ /(Some.?text)(.)/;`
print "One: $1nTwo: $2n";
$ test.pl
One: Some chunk of text
Two: that has text
In this example Perl will no longer become greedy when evaluating the expression, and will attempt to match up to the left-most occurence of the string element prefaced by ‘?'. Regular expressions are amazingly cool, but sometimes it feels like witchcraft when developing the right regex incantation to solve complex problems.