DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 9.6 Matching Patterns with Regular Expressions

9.6.1 Problem

You want to match a pattern within a string instead of finding a specific substring.

9.6.2 Solution

Use a regular expression and the RegExp.exec( ) method.

9.6.3 Discussion

Many programming languages support regular expressions to match patterns in strings. (You may be familiar with other types of pattern matching. For example, Windows' file search feature lets you use pattern matching with wildcards, such as * and ?. But regular expressions support much more sophisticated pattern matching.) ActionScript does not provide native support for regular expressions. However, there are several third-party classes that are publicly available. One such class is the RegExp class by Pavils Jurjans, which is very similar to the JavaScript 1.3 RegExp class (JavaScript 1.5 implements new RegExp features not supported by the ActionScript RegExp class).

Table 9-2 summarizes regular expression pattern-matching operations.

Table 9-2. Regular expressions

Expression

Matches

Example

?

The preceding character zero or one time (i.e., preceding character is optional).

ta?k matches "tak" or "tk" but not "tik" or "taak"

*

The preceding character zero or more times.

wo*k matches "wok", "wk", or "woook", but not "wak"

+

The preceding character one or more times.

craw+l matches "crawl" or "crawwl" but not "cral"

. (period)

Any one character except newline.

c.ow matches "crow" or "clow" but not "cow"

^

Specified string at beginning of a line.

^wap matches "wap" but not "swap"

$

Specified string at the end of a line. (The "$" metacharacter should be at the end of the pattern, such as w$. Though RegExp accepts it at the beginning of the pattern, such as $w, this feature is not supported by the ECMA standard.)

ow$ matches "ow" but not "owl"

x|y

Either statement.

one|two matches "one" or "two" but not "ten"

[abc]

Any of the characters within the brackets.

l[aeo]g matches "lag", "leg", or "log" but not "lig"

[a-z]

Any characters within the range.

[0-3]* matches "1320" but not "4523"

[^abc]

Any character other than those listed.

l[^aeo]g matches "lig" but not "lag", "leg", or "log"

[^a-z]

Any characters not in the range.

[^0-3]* matches "4758" but not "4931"

{n}

Exactly n occurrences of the preceding character.

cre{2}l matches "creel" but not "crel" or "creeel"

{n,m}

At least n but no more than m instances of the preceding letter.

cre{2,3}l matches "creel" or "creeel" but not "crel"

\b

Word boundary.

up\brow matches "up row" but not "uprow"

\B

Letter not at the beginning of a word.

up\Brow matches "uprow" but not "up row"

\d

Any numeric digit; same as [0-9].

\d* matches "13243" but not "13A46"

\D

Any non-digit character; same as [^0-9].

\D* matches "1ABC3" but not "13946"

\s

Single whitespace character (space, tab, line feed, or form feed).

\s matches the space in "King Tut"

\S

Single non-whitespace character.

\STut matches "gTut" but not "Tut"

\w

Any alphanumeric character; same as [A-Za-z0-9]

a\wm matches "arm" but not "a8m"

\W

Any nonalphanumeric character.

a\Wm matches "a7m" but not "aim"

\x

Escaped character (non-metacharacters) specified by x.

\/ finds slashes; \( finds parentheses, etc.

The RegExp class does not try to interpret escape sequences that are natively interpreted by Flash. The ActionScript RegExp class interprets the escape sequences \d, \D, \s, \S, \w, \W, \b, and \B, but other escape sequences, such as \n and \t, are interpreted by Flash itself. Therefore, if you want to match a newline character, you should use the pattern "\n", but if you want to match a digit, you use the pattern "\\d" (note the double backslash before "d").

You must first download and install the RegExp class if you wish to use it in your Flash documents. You can download it from Pavils's web site:

http://www.jurjans.lv/flash/RegExp.html

You can download the ActionScript file itself (RegExp.as) or a zip file (RegExp.zip) that contains additional support files. Whichever download you choose, copy the RegExp.as file into your Flash installation's Include directory (Flash Installation/Configuration/Include). From there, you can easily include it in any Flash document.

The main difference between Pavils's ActionScript RegExp class and its JavaScript kin is that there is no way to define a regular expression using an object initializer. Instead, you must always use the constructor method:

// This will work in JavaScript but not in ActionScript.
re = /[a-z]*/;

// This is the proper way to create a regular expression in ActionScript.
re = new RegExp("[a-z]*");

// Create a regular expression that matches a backslash.
re = new RegExp("\\\\");

Because the RegExp object is created by passing a string to the constructor, all references to \ within the string must be escaped as \\. Since \ is also a special character in RegExp patterns, to search for a backslash in a regular expression, you must escape it like this: "\\\\".

When constructing a regular expression, you can specify a second parameter containing flags that modify its behavior. The most common flags are "i" for case-insensitive matches and "g" for global matching (finds all matches at once and returns them in an array). For example:

// This matches all letters a, b, c, A, B, and C.
re = new RegExp("[a-c]", "ig");

Once you have created a regular expression that describes the pattern for which you want to search within the string, use the regExp.exec( ) method to perform the search. The exec( ) method takes the string as a parameter, and it returns the match. Each call to exec( ) searches for the next match. If no match is found, it returns null.

// You must include the third-party RegExp.as file from
// http://www.jurjans.lv/flash/RegExp.html.
#include "RegExp.as"

// Create a regular expression that matches three-letter words.
re = new RegExp("\\b[a-z]{3}\\b", "g");

myString = "This string has two three-letter words";

// Search the string for the pattern and display the first result: has.
match = re.exec(myString);
trace(match);

// Search the string again for the pattern and display the next result: two.
match = re.exec(myString);
trace(match);

// Search the string again. No more matches, so the result is null.
match = re.exec(myString);
trace(match);

The exec( ) method continues to cycle through the string with each call. After the method returns null, it will return to the beginning of the string for the next search.

You can use a while statement with the exec( ) method to find all the matches, like so:

#include "RegExp.as"

// Create a regular expression that matches three-letter words.
re = new RegExp("\\b[a-z]{3}\\b", "g");
myString = "This string has two three-letter words";

/* Loop until the exec(  ) method returns null. This while loop outputs:
   has
   two
*/
while ((match = re.exec(myString)) != null) {
  trace(match);
}

The RegExp.test( ) method tests whether a string contains a match to a regular expression. The method returns true if the pattern is matched, and false otherwise. You can use test( ) to test whether a string is valid for a particular use, such as whether it takes the form of a valid email address. For example:

#include "RegExp.as"

// Create a regular expression that matches an email pattern.
re = new RegExp("^([\\w\-\\.]+)@(([\\w\\-]+\\.)+[\\w\\-]+)$");

// Create an array of strings that may or may not be valid emails.
emails = new Array(  );
emails.push("someone@someserver.com");
emails.push("your.name@someplace.org");
emails.push("email goes here");

/* Test each array element to see whether it is a valid email. The results are:
   true
   true
   false
*/ 
for (var i = 0; i < emails.length; i++) {
  trace(re.test(emails[i]));
}

9.6.4 See Also

A detailed discussion of regular expressions is beyond the scope of this book. A good primer on the JavaScript RegExp class can be found at http://devedge.netscape.com/library/manuals/2000/javascript/1.3/guide/regexp.html. JavaScript: The Definitive Guide by David Flanagan (O'Reilly) includes detailed coverage of using regular expressions in JavaScript. See Mastering Regular Expressions by Jeffrey E. F. Friedl (O'Reilly) for extensive practice with regular expressions. Also refer to Recipe 9.4 and Recipe 9.7. Recipe 9.9 demonstrates using regular expressions to remove nonalphanumeric characters in a string. Also see Recipe 8.7, which covers filtering text input. Table A-1 lists the Unicode code points for the Latin 1 character set. Recipe 11.4 discusses validating data input.

    [ Team LiB ] Previous Section Next Section