DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 6.20 Restricting Proxy Access to Certain URLs

Problem

You don't want people using your proxy server to access particular URLs or patterns of URLs (such as MP3 or streaming video files).

Solution

You can block by keyword:

ProxyBlock .rm .ra .mp3

You can block by specific backend URLs:

<Directory proxy:http://other-host.org/path>
    Order Allow,Deny
    Deny from all
    Satisfy All
</Directory>

Or you can block according to regular expression pattern matching:

<Directory proxy:*>
    RewriteEngine On
    #
    # Disable proxy access to Real movie and audio files
    #
    RewriteRule "\.(rm|ra)$" "-" [F,NC]
    #
    # Don't allow anyone to access .mil sites through us
    #
    RewriteRule "^[a-z]+://[-.a-z0-9]*\.mil($|/)" "-" [F,NC]
</Directory>

Discussion

All of these solutions will result in a client that attempts to access a blocked URL receiving a 403 Forbidden status from the server.

The first solution uses a feature built into the proxy module itself: the ProxyBlock directive. It's simple and efficient, and it catches the results so that future accesses to the same URL are blocked with less effort; however, the pattern matching it can perform is extremely limited and prone to confusion. For instance, if you specify:

ProxyBlock .mil

the server denies access to both http://www.navy.mil/ and http://example.com/spec.mil/list.html. This is probably not what was intended!

The second method allows you to impose limitations based on the URL being fetched (or gateway, in the case of a ProxyPass directive).

The third method, which allows more complex what-to-block patterns to be constructed, is both more flexible and more powerful, and somewhat less efficient. Use it only when the other methods prove insufficient.

<DirectoryMatch> containers work as well, so more complex patterns may be used.


The flags to the RewriteRule directive tell it, first, that any URL matching the pattern should result in the server returning a 403 Forbidden error (F or forbidden), and second that the pattern match is case-insensitive (NC or nocase).

One disadvantage of the mod_rewrite solution is that it can be too specific. The first RewriteRule pattern can be defeated if the client specifies path-info or a query string, or if the origin server uses a different suffix naming scheme for these types of files. A little cleverness on your part can cover these sorts of conditions, but beware of trying to squeeze too many possibilities into a single regular expression pattern. It's generally better to have multiple RewriteRule directives than to have a single all-singing all-dancing one that no one can read—and is hence prone to error.

See Also

    [ Team LiB ] Previous Section Next Section