DekGenius.com
Previous Section  < Day Day Up >  Next Section

B.5 Greed

Quantifiers in the PHP interpreter's regular expression engine are greedy. This means they match as much as they can. The pattern <b>.*</b> means "the string <b>, then zero or more characters, then the string </b>." The "more" in "zero or more" matches as many characters as possible. When the pattern is applied to the string <b>Look Out!</b> <i>Caution!</i> <b>Uh-Oh!</b>, the .* matches Look Out!</b> <i>Caution!</i> <b>Uh-Oh!. The greediness of the quantifier causes it to skip over the first </b> it sees and gobble up characters to the last </b> in the string.

To turn a quantifier from greedy to nongreedy, put a question mark after it. The pattern <b>.*?</b> still matches "the string <b>, then zero or more characters, then the string </b>", but now the "more" in "zero or more" matches as few characters as possible. Example B-1 shows the difference between greedy and nongreedy matching with preg_match_all( ). (Example B-5 details how preg_match_all( ) works, including the meaning of the @ characters at the start and end of the pattern.)

Example B-1. Greedy and nongreedy matching
$meats = "<b>Chicken</b>, <b>Beef</b>, <b>Duck</b>";

// With a non-greedy quantifier, each meat is matched separately
preg_match_all('@<b>.*?</b>@',$meats,$matches);
foreach ($matches[0] as $meat) {
    print "Meat A: $meat\n";
}

// With a greedy quantifier, the whole string is matched just once
preg_match_all('@<b>.*</b>@',$meats,$matches);
foreach ($matches[0] as $meat) {
    print "Meat B: $meat\n";
}

Example B-1 prints:

Meat A: <b>Chicken</b>
Meat A: <b>Beef</b>
Meat A: <b>Duck</b>
Meat B: <b>Chicken</b>, <b>Beef</b>, <b>Duck</b>

The nongreedy quantifier in the first pattern makes the first match by preg_match_all( ) stop short at the first </b> it sees. This leaves part of $meats to be matched by subsequent applications of the pattern by preg_match_all( ).

But with the greedy quantifier in the second example, the first match by preg_match_all( ) scoops up all of the text, leaving nothing matchable for subsequent applications of the pattern.

    Previous Section  < Day Day Up >  Next Section