Appendix A. Using Regular Expressions in Apache
A number of the Apache web server's configuration
directives permit (or require!) the use of what are called
regular expressions. Regular expressions are used to
determine if a string, such as a URL or a user's
name, matches a pattern.
There are numerous resources that cover regular expressions in
excruciating detail, so this appendix is not designed to be a
tutorial for their use. Instead, it documents the specific features
of regular expressions used by Apache—what's
available and what isn't. Even though there are
quite a number of regular expression packages, with differing feature
sets, there are some commonalities among them. The Perl language, for
instance, has a particularly rich set of regular expressions but only
a small subset of them are available in the Apache regex library,
which is different from Perl's.
Regular expressions, as mentioned, are a language that allows you to
determine if a particular string or variable looks like some pattern.
For example, you may wish to determine if a particular string is all
uppercase, or if it contains at least 3 numbers, or perhaps if it
contains the word "monkey" or
"Monkey." Regular expressions
provide a vocabulary for talking about these sort of tests. Most
modern programming languages contain some variety of regular
expression library, and they tend to have a large number of things in
common, although they may differ in small details.
Apache 1.3 uses a regular
expression library called
hsregex,
so called because it was developed by Henry
Spencer. Note
that this is the same regular expression library used in
egrep, which is the same thing as
grep on many Unixish platforms.
Apache 2.0 uses a somewhat more full-featured regular expression
library called Perl Compatible
Regular Expressions (PCRE), so called because it implements many of
the features available in the regular expression engine that comes
with the Perl programming language. While this appendix does not
attempt to communicate all the differences between these two
implementations, you should know that hsregex is
a subset of PCRE, as far as functionality goes, so everything you can
do with regular expressions in Apache 1.3, you can do in 2.0, but not
necessarily the other way around.
To grossly simplify, regular expressions implement two kinds of
characters. Some characters mean exactly what they say (for example,
a G appearing in a regular expression will usually
mean the literal character G), while some characters have special
significance (for example, the period (.) will match any character at
all—a wildcard character). Regular expressions can be composed
of these characters to represent (almost) any desired pattern
appearing in a string.
|