< Day Day Up > |
B.4 Character ClassesA character class lets you represent a bunch of characters (a "class") as a single item in a regular expression. Put characters in square brackets to make a character class. A character class matches any one of the characters in the class. This pattern matches a person's name or a bird's name: ^D[ao]ve$ The pattern matches Dave or Dove. The character class [ao] matches either a or o. To put a whole range of characters in a character class, just put the first and last characters in, separated by a hyphen. For instance, to match all English alphabetic characters: [a-zA-Z] When you use a hyphen in a character class to represent a range, the character class includes all the characters whose ASCII values are between the first and last character (and the first and last character). If you want a literal hyphen inside a character class, you must backslash-escape it. The character class [a-z] is the same as [abcdefghijklmnopqrstuvwxyz], but the character class [a\-z] matches only three characters: a, -, and z. You can also create a negated character class, which matches any character that is not in the class. To create a negated character class, begin the character class with ^: // Match everything but letters [^a-zA-Z] The character class [^a-zA-Z] matches every character that isn't an English letter: digits, punctuation, whitespace, and control characters. Even though ^ is used as an anchor outside of character classes, its only special meaning inside a character class is negation. If you want to use a literal ^ inside a character class, either don't put it first in the character class or backslash-escape it. Each of these patterns match the same strings: [0-9][%^][0-9] [0-9][\^%][0-9] Each pattern matches a digit, then either % or ^, then another digit. This matches strings such as 5^5, 3%2, or 1^9. Character classes are more efficient than alternation when choosing among single characters. Instead of s(a|o|i)p, which matches sap, sop, and sip, use s[aoi]p. Some commonly used character classes are also represented by dedicated metacharacters, which are more concise than specifying every character in the class. These metacharacters are shown in Table B-3.
These metacharacters can be used just like character classes. This pattern matches valid 24-hour clock times: ([0-1]\d|2[0-3]):[0-5]\d You can also include these metacharacters inside a character class with other characters. This pattern matches hexadecimal numbers: [\da-fA-F]+ |
< Day Day Up > |