CSharp in a Nutshell 2nd Edition-CSharp in a Nutshell 2nd Edition

Appendix A. Regular Expressions

The following tables summarize the regular-expression grammar and syntax supported by the regular-expression classes in System.Text.RegularExpression. Each of the modifiers and qualifiers in the tables can substantially change the behavior of the matching and searching patterns. For further information on regular expressions, we recommend the definitive Mastering Regular Expressions by Jeffrey E. F. Friedl (O'Reilly & Associates, 1997).

All the syntax described in the tables should match the Perl5 syntax, with specific exceptions noted.

Table A-1. Character escapes

Escape code sequence

Meaning

Hexadecimal equivalent

\a

Bell

\u0007

\b

Backspace

\u0008

\t

Tab

\u0009

\r

Carriage return

\u000A

\v

Vertical tab

\u000B

\f

Form feed

\u000C

\n

Newline

\u000D

\e

Escape

\u001B

\040

ASCII character as octal

\x20

ASCII character as hex

\cC

ASCII control character

\u0020

Unicode character as hex

\non-escape

A nonescape character

Special case: within a regular expression, \b means word boundary, except in a [ ] set, in which \b means the backspace character.

Table A-2. Substitutions

Expression

Meaning

$group-number

Substitutes last substring matched by group-number

${group-name}

Substitutes last substring matched by (?<group-name>)

Substitutions are specified only within a replacement pattern.

Table A-3. Character sets

Expression

Meaning

.

Matches any character except \n

[characterlist]

Matches a single character in the list

[^characterlist]

Matches a single character not in the list

[char0-char1]

Matches a single character in a range

\w

Matches a word character; same as [a-zA-Z_0-9]

\W

Matches a nonword character

\s

Matches a space character; same as [\n\r\t\f]

\S

Matches a nonspace character

\d

Matches a decimal digit; same as [0-9]

\D

Matches a nondigit

Table A-4. Positioning assertions

Expression

Meaning

^

Beginning of line

$

End of line

\A

Beginning of string

\Z

End of line or string

\z

Exactly the end of string

\G

Where search started

\b

On a word boundary

\B

Not on a word boundary

Table A-5. Quantifiers

Quantifier

Meaning

*

0 or more matches

+

1 or more matches

?

0 or 1 matches

{n}

Exactly n matches

{n,}

At least n matches

{n,m}

At least n, but no more than m matches

*?

Lazy *, finds first match that has minimum repeats

+?

Lazy +, minimum repeats, but at least 1

??

Lazy ?, zero or minimum repeats

{n}?

Lazy {n}, exactly n matches

{n,}?

Lazy {n}, minimum repeats, but at least n

{n,m}?

Lazy {n,m}, minimum repeats, but at least n, and no more than m

Table A-6. Grouping constructs

Syntax

Meaning

( )

Capture matched substring

(?<name>)

Capture matched substring into group name^[1]

(?<number>)

Capture matched substring into group number^*

(?<name1-name2>)

Undefine name2, and store interval and current group into name1; if name2 is undefined, matching backtracks; name1 is optional^*

(?: )

Noncapturing group

(?imnsx-imnsx: )

Apply or disable matching options

(?= )

Continue matching only if subexpression matches on right

(?! )

Continue matching only if subexpression doesn't match on right

(?<= )

Continue matching only if subexpression matches on left

(?<! )

Continue matching only if subexpression doesn't match on left

(?> )

Subexpression is matched once, but isn't backtracked

^[1] Single quotes may be used instead of angle brackets—for example (?'name').

The named capturing group syntax follows a suggestion made by Friedl in Mastering Regular Expressions. All other grouping constructs use the Perl5 syntax.

Table A-7. Back references

Parameter syntax

Meaning

\count

Back reference count occurrences

\k<name>

Named back reference

Table A-8. Alternation

Expression syntax

Meaning

|

Logical OR

(?(expression)yes|no)

Matches yes if expression matches, else no; the no is optional

(?(name)yes|no)

Matches yes if named string has a match, else no; the no is optional

Table A-9. Miscellaneous Constructs

Expression syntax

Meaning

(?imnsx-imnsx)

Set or disable options in midpattern

(?# )

Inline comment

# [to end of line]

X-mode comment

Table A-10. Regular expression options

Option

Meaning

i

Case-insensitive match

m

Multiline mode; changes ^ and $ so they match beginning and ending of any line

n

Capture explicitly named or numbered groups

c

Compile to MSIL

s

Single-line mode; changes meaning of "." so it matches every character

x

Eliminates unescaped whitespace from the pattern

r

Search from right to left; can't be specified in midstream

[ Team LiB ]