mod_cdn
mod_cdn is an apache2 module developed at Voxel that makes VoxCAST and other CDNs easier to use by automatically "CDNifying" some HTML content and "originifying" the origin apache server to send proper headers, validate authentication tokens, etc. mod_cdn is meant to be installed and configured on a CDN customer's origin server. With mod_cdn, turning up a CDN for your website is as easy as setting up a simple apache module.
The source code for mod_cdn is available under the GPL v2 license.
Please direct technical questions about mod_cdn to support@voxel.net. Also, check out the FAQ to see if your questions have already been addressed.
Features
mod_cdn currently has the following features:
- Find links to content (e.g. img src tags) in HTML and rewrite the URLs to point to a different server (e.g., a VoxCAST host)
- Automatically add voxtoken authentication token query string arguments to links in HTML that are being CDNified
- Automatically add query string ignore tokens in the query strings of URLs being CDNified, relative to some other query string argument
- Add proper
Expiresheaders to static content for which the server is the CDN origin - Add
Vox-Authorization: requiredheaders to static content for which the server is the CDN origin, and which requires authentication - Verify the authentication tokens in query string arguments of static content requests for which the server is the CDN origin
Most sites probably don't need all of these features; they can each be turned on and off and configured at a fairly high granularity.
Another major feature in the works is dynamic threshold based switching between CDN and non-CDN that takes into account server load. Let us know if you think of other features we should add to mod_cdn -- or send us a patch!
While some of mod_cdn's features are specific to VoxCAST, mod_cdn is almost as useful for customers of other CDNs: server remapping, Expires headers, and other mod_cdn features address CDNification tasks common to most CDNs.
Downloads
mod_cdn 1.1.0 (source) [27 KB] -- 2009-11-29
Please note: mod_cdn has mainly been tested on Voxel's apache deployments and may need some tweaking to work on other setups. Please let us know if you encounter problems (or find solutions)!
Installing mod_cdn
There's no package yet for mod_cdn, but we'll make some soon. The
module consists of an apache module (dynamic library) that must be put
in /var/lib/apache2/modules/mod_cdn.so (the path might be
slightly different on your system). To load the module, place cdn.load and cdn.conf in /etc/apache2/mods-available and link to them from mods-enabled. (Again, the procedure might be slightly
different for you -- our instructions are based on a Debian
installation.)
cdn.conf
The global (apache-wide) configuration for mod_cdn in cdn.conf includes a few directives that should work as-is
for pretty much everyone. These are:
CDNHTMLContentType
Defines the set of content types that will be parsed as (X)HTML and
CDNified. The defaults are text/html and application/xhtml+xml. If other content types should be
CDNified (and are really just HTML) then you can add them to the list.
You can also just include the directive elsewhere (e.g. within a
VirtualHost or Directory/Location section) and any new values will be
added to the list.
CDNHTMLLinks
Defines the set of HTML tag/attribute pairs that we expect to
contain links to content. mod_cdn will look for these tags/attributes
and, when found, will CDNify the link if necessary. There are some
tag/attribute pairs that contain URLs, but which we probably don't
want to CDNify, like form/action. These are
left out of the list in cdn.conf. New tag/attribute
pairs can be added either in cdn.conf or elsewhere in the
apache config tree.
Configuration
To configure mod_cdn, there are a variety of apache config
directives that control HTML parsing/replacing and other
originification options. Almost always, these directives should be
applied within a VirtualHost section, and probably even within a
Directory/Location section, especially in the case of CDNActAsOrigin, where often the content you want
offloaded to the CDN is within a particular directory (e.g., /images).
See the example.conf shipped with mod_cdn for some
sample uses of these directives.
CDNHTMLFromServers
CDNHTMLFromServers static.example.com images.example.com ...
By default, any relative links (e.g. "../blah.png") and locally
absolute links ("/path/to/blah.png") that match a regex defined with CDNHTMLRemapURLServer will be CDNified. In addition,
globally absolute links ("http://example.com/path/to/blah.gif") where
the server name matches the current VHost's ServerName or
one of its ServerAliases will be CDNified if they match.
It may be the case that some of the content linked to in the HTML has
URLs on a different server (e.g., "static.example.com"). CDNHTMLFromServers specifies a list of servernames for
which we'll CDNify globally absolute links that match one of the
regexes.
CDNHTMLToServer
CDNHTMLToServer http://1234.voxcdn.com
Set the CDN hostname to which traffic will be redirected. This should be in the form of http[s]://xxxx.voxcdn.com[:port] .
CDNHTMLRemapURLServer
CDNHTMLRemapURLServer regex flags
The main workhorse for CDNification via server remapping. Multiple
instances of this directive can be used. If the regex matches a URL
in the HTML, the URL is remapped to point to the CDNHTMLToServer. The flags affect the regex matching and
the URL rewriting. Flags are individual letters, unseparated by any
spaces, and include:
- 'x': use POSIX extended regular expressions
- 'i': ignore case in matching the regex
- 'a': add
voxtokenauthentication tokens to the query strings or matching URLS, based onCDNAuthKey - 'q': add query string ignore tokens to the query strings of
matching URLs, based on
CDNIgnoreTokenNameandCDNHTMLIgnoreTokenLocation
For example, to match all files with a .png extension
and no query string, and add authentication tokens to them:
CDNHTMLRemapURLServer \.png$ iaTo match
.pngs with a query string and also add a query
string ignore token:
CDNHTMLRemapURLServer \.png\? iaq
CDNHTMLDocType
CDNHTMLDocType [HTML|XHTML] [legacy]
Set the DOCTYPE tag that will be inserted at the top of the HTML:
- HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> - HTML+legacy:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - XHTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - XHTML+legacy:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> - other: anything other than "HTML" or "XHTML" (with optional "legacy") will be inserted as the DOCTYPE tag directly, at the top of the document.
If this directive is omitted, no DOCTYPE is inserted but any existing one will not be passed through.
CDNHTMLDefaultCharset
CDNHTMLDefaultCharset utf-8
Specify the default input character set to be used if one cannot be automatically detected. Currently, mod_cdn always outputs UTF-8.
CDNHTMLStartParse
CDNHTMLStartParse tagname
Specify an HTML tag such that anything in front of the first occurrence of the tag in the document is ignored. For example:
CDNHTMLStartParse html
ignores anything that comes before the <html> tag.
CDNAuthKey
CDNAuthKey string
Specify the authentication key (assigned by Voxel or set via the CDN portal) for generating and verifying authentication tokens. Note that in using mod_cdn's authentication capabilities, caution needs to be taken in changing the authentication key for the CDN host.
CDNAuthAlt
CDNAuthAlt http://example.com/failed-auth.html
Specify a URL to which a user will be redirected if a request for
an object fails authentication (i.e., the voxtoken supplied with the request is incorrect).
CDNAuthExpire
CDNAuthExpire 900
Set the default expiration time in seconds for generated authentication tokens.
CDNHTMLAddAuthTokens
CDNHTMLAddAuthTokens on
This is just a shortcut that is the same as setting the 'a' flag
for every CDNHTMLRemapURLServer directive.
CDNIgnoreTokenName
CDNIgnoreTokenName string
Set the name of a query string argument that will be inserted into
the query string of remapped URLs when the 'q' flag is set (or for
every remapped URL if CDNHTMLAddIgnoreTokens is set).
(What will actually be inserted is "string=ignore" since VoxCAST
currently expects the ignore token argument to include a value.)
CDNHTMLIgnoreTokenLocation
CDNHTMLIgnoreTokenLocation [before|after] arg-name
Set the relative insertion point of the ignore token into the query
string when the ignore token is being added to a remapped URL. The
token is inserted either before or after some other query string
argument (arg-name). For example, to insert the ignore token
before timestamp in a query string like
"?id=52&type=png×tamp=1220435841":
CDNHTMLIgnoreTokenLocation before timestamp
CDNHTMLAddIgnoreTokens
CDNHTMLAddIgnoreTokens on
This is just a shortcut that is the same as setting the 'q' flag
for every CDNHTMLRemapURLServer directive.
CDNActAsOrigin
CDNActAsOrigin regex flags [expiration-time]
The main workhorse for "originification", i.e., delivery of requests from the CDN for which the server running mod_cdn is the origin. The response for requests where the requested URL matches the regex are modified according to the flags. Flags include:
- 'x': use POSIX extended regular expressions
- 'i': ignore case in matching the regex
- 'e': add an
Expiresheader to the response. If the expiration-time value is set, this will be used to compute the time of expiration; otherwise, the value set withCDNDefaultExpireis used. - 'a': add a
Vox-Authorization: requiredheader to the response (with an alternative URL if specified byCDNAuthAlt), and verify thatvoxtokenis set and correct in the request's query string; if not, return a 403 response code.
CDNDefaultExpire
CDNDefaultExpire seconds
Set the default expiration time in seconds for content that matches
one of the CDNActAsOrigin regexes.
FAQ
Here are the answers to some tricky questions you might have while implementing mod_cdn.
How do I compile mod_cdn?
Currently there are no packages for mod_cdn, although we're in the
process of making some for, at least, Debian and CentOS. That means
you need to compile mod_cdn from the source. To do this, you need to
install APR, the Apache runtime library. On a Debian system, the
package is libapr1-dev; you may also need libaprutil1-dev. You should
compile mod_cdn against Apache 2.2.7 or higher. (Our testing has
mainly been with 2.2.8.) Then, just run make.
How do I set up mod_cdn on CentOS/Fedora/Redhat?
The installation instructions above are for Debian systems.
Everything is mostly the same on a CentOS system, except for the
layout of the Apache configuration. Take the lines from
cdn.load and put them in
/etc/httpd/conf/httpd.conf near the other
LoadModule lines. (Don't forget to make sure that
libxml2 is installed and available at
/usr/lib/libxml2.so.2; if it's available elsewhere,
change the LoadFile line.) Put cdn.conf in
/etc/httpd/conf.d. Put mod_cdn.so in
/usr/lib/httpd/modules. That should do it; put the
mod_cdn configuration directives wherever your normal
VirtualHost configuration resides.
Some assets are still being fetched from my origin and not the CDN. Why?
Assuming your regexes are correct, the assets in question are
probably referenced in either CSS or Javascript, rather than in your
site's HTML. mod_cdn speaks (X)HTML but does not currently
parse CSS or Javascript, either embedded or in separate files. This
means, for example, that if you use background-image or
other URL-referencing attributes in your CSS, mod_cdn won't rewrite
those URLs. Remember: mod_cdn is a quick-and-dirty way to take a lot
of the load off your origin. More complicated setups will need custom
work to truly offload all the intended content delivery to the
CDN.
mod_cdn is doing really weird things to my Javascript.
Because mod_cdn is based on libxml2, it's subject to some of the limitations of that library with respect to XML parsing. One problem that's been reported occurs with HTML looking something like this:
<div>
<script type="text/javascript">
...
document.write('<div>blah</div>');
...
</script>
</div>
Here, the </div> within the Javascript is
erroniously treated as non-CDATA by libxml2, which tries to fix it by
moving the </script> earlier. Other XML libraries
seem to have the same problem. A simple workaround is to use the old
Javascript/HTML comment trick so mod_cdn ignores Javascript with HTML
embedded in it:
<div>
<script type="text/javascript">
// <!--
...
document.write('<div>blah</div>');
...
// -->
</script>
</div>
I'm seeing gibberish or weird encoding errors in HTML parsed by mod_cdn.
This might be due to interactions between mod_cdn and another Apache module. Some users have reported problems with mod_deflate in some versions of Apache, although we've used mod_cdn and mod_deflate together successfully with Apache 2.2.8. Try temporarily disabling unnecessary modules. If that fixes your problem, please let us know the details of your situation and we'll try to address the problem permanently.

