Commit 1a5f455d authored by mikeytown2's avatar mikeytown2

Initial commit of boost-7.x

parent d9c7aec0
// $Id$
REQUIREMENTS
------------
This version of Boost is designed for Drupal 4.7 running on a Unix platform.
Drupal's clean URLs MUST be enabled and working properly.
The `path' and `pathauto' modules are recommended.
In order for the static files to be correctly expired, the Drupal cron job
must be correctly setup to execute more often than, or as often as, the
cache lifetime interval.
Since the static page caching is implemented with mod_rewrite directives,
Apache version 1.3 or 2.x with mod_rewrite enabled is required (if Drupal's
clean URLs work for you, you're fine; if not, get them working first).
Other web servers, such as Lighttpd, are NOT supported at present.
The `drush' module is required for (optional) command line usage.
INSTALLATION
------------
1. Go to administer >> settings and ensure that Drupal's clean URLs are
enabled and working properly on your site.
2. Copy all the module files into a subdirectory called modules/boost/
under your Drupal installation directory.
3. Go to administer >> modules and enable the Boost module.
4. Go to administer >> settings >> boost to review and change the
configuration options to your liking.
5. Go to administer >> settings and enable static caching.
6. Log out from Drupal (or use another browser) and browse around your site
as the anonymous user. Ensure that static files are indeed being
generated into the Boost cache directory.
7. IMPORTANT: replace your .htaccess file in the Drupal installation
directory with the file from modules/boost/htaccess/boosted.txt.
(If you fail to do this, static page caching will NOT work!)
8. (See README.txt for information on submitting bug reports.)
// $Id$
NOTE: this module is currently in an alpha state. Come back in a bit unless
you're an experienced user and don't mind figuring things out on your own.
DESCRIPTION
-----------
This module provides static page caching for Drupal 4.7, enabling a
potentially very significant performance and scalability boost for
heavily-trafficked Drupal sites.
For an introduction, read the original blog post at:
http://bendiken.net/2006/05/28/static-page-caching-for-drupal
FEATURES
--------
* Maximally fast page serving for the anonymous visitors to your Drupal
site, reducing web server load and boosting your site's scalability.
* On-demand page caching (static file created after first page request).
* Full support for multi-site Drupal installations.
* Command line administration support (requires the drush module).
INSTALLATION
------------
Please refer to the accompanying file INSTALL.txt for installation
requirements and instructions.
HOW IT WORKS
------------
Once Boost has been installed and enabled, page requests by anonymous
visitors will be cached as static HTML pages on the server's file system.
Periodically (when the Drupal cron job runs) stale pages (i.e. files
exceeding the maximum cache lifetime setting) will be purged, allowing them
to be recreated the first time that the next anonymous visitor requests that
page again.
New rewrite rules are added to the .htaccess file supplied with Drupal,
directing the web server to try and fulfill page requests by anonymous
visitors first and foremost from the static page cache, and to only pass the
request through to Drupal if the requested page is not cacheable, hasn't yet
been cached, or the cached copy is stale.
FILE SYSTEM CACHE
-----------------
The cached files are stored (by default) in the cache/ directory under your
Drupal installation directory. The Drupal pages' URL paths are translated
into file system names in the following manner:
http://mysite.com/
=> cache/mysite.com/0/index.html
http://mysite.com/about
=> cache/mysite.com/0/about.html
http://mysite.com/about/staff
=> cache/mysite.com/0/about/staff.html
http://mysite.com/node/42
=> cache/mysite.com/0/node/42.html
You'll note that the directory path includes the Drupal site name, enabling
support for multi-site Drupal installations. The zero that follows, on the
other hand, denotes the user ID the content has been cached for -- in this
case the anonymous user (which is the default, and only, choice available
for the time being).
DISPATCH MECHANISM
------------------
For each incoming page request, the new Apache mod_rewrite directives in
.htaccess will check if a cached version of the requested page should be
served as per the following simple rules:
1. First, we check that the HTTP request method being used is GET.
POST requests are not cacheable, and are passed through to Drupal.
2. Next, we make sure that the URL doesn't contain a query string (i.e.
the part after the `?' character, such as `?q=cats+and+dogs'). A query
string implies dynamic data, and any request that contains one will
be passed through to Drupal. (This also allows one to easily obtain the
current, non-cached version of a page by simply adding a bogus query
string to a URL path -- very useful for testing purposes.)
3. Since only anonymous visitors can benefit from the static page cache at
present, we check that the page request doesn't include a cookie that
is set when a user logs in to the Drupal site. If the cookie is
present, we simply let Drupal handle the page request dynamically.
4. Now, for the important bit: we check whether we actually have a cached
HTML file for the request URL path available in the file system cache.
If we do, we direct the web server to serve that file directly and to
terminate the request immediately after; in this case, Drupal (and
indeed PHP) is never invoked, meaning the page request will be served
by the web server itself at full speed.
5. If, however, we couldn't locate a cached version of the page, we just
pass the request on to Drupal, which will serve it dynamically in the
normal manner.
IMPORTANT NOTES
---------------
* Drupal URL aliases get written out to disk as relative symbolic links
pointing to the file representing the internal Drupal URL path. For this
to work correctly with Apache, ensure your .htaccess file contains the
following line (as it will by default if you've installed the file shipped
with Boost):
Options +FollowSymLinks
* To check whether you got a static or dynamic version of a page, look at
the very end of the page's HTML source. You have the static version if the
last line looks like this:
<!-- Page cached by Boost at 2006-11-24 15:06:31 -->
* If your Drupal URL paths contain non-ASCII characters, you may have to
tweak your locate settings on the server in order to ensure the URL paths
get correctly translated into directory paths on the file system.
Non-ASCII URL paths have currently not been tested at all and feedback on
them would be appreciated.
LIMITATIONS
-----------
* Only anonymous visitors will be served cached versions of pages; logged-in
users will get dynamic content. This may somewhat limit the usefulness of
this module for those community sites that require user registration and
login for active participation.
* Only content of the type `text/html' will get cached at present. RSS feeds
and URL paths that have some other content type (e.g. set by a third-party
module) will be silently ignored by Boost.
* In contrast to Drupal's built-in caching, static caching will lose any
additional HTTP headers set for an HTML page by a module. This is unlikely
to be problem except for some very specific modules and rare use cases.
* Web server software other than Apache is not supported at the moment.
Adding Lighttpd support would be desirable but is not a high priority for
the author at present (see TODO.txt). (Note that while the LiteSpeed web
server has not been specifically tested by the author, it may, in fact,
work, since they claim to support .htaccess files and to have mod_rewrite
compatibility. Feedback on this would be appreciated.)
* At the moment, Windows users are S.O.L. due to the use of symlinks and
Unix-specific shell commands. The author has no personal interest in
supporting Windows but will accept well-documented, non-detrimental
patches to that effect.
BUG REPORTS
-----------
Post feature requests and bug reports to the issue tracking system at:
http://drupal.org/node/add/project_issue/boost
CREDITS
-------
Developed and maintained by Arto Bendiken <http://bendiken.net/>
// $Id$
This is a listing of known bugs, features that mostly work but are still
somewhat in progress, features that are being considered or planned for
implementation, and just miscellaneous far-out ideas that could, in
principle, be implemented if one had the time and inclination to do so.
(NOTE: there is no guarantee any of these items will, in fact, be
implemented, nor should any possible scheduling indications be construed as
promises under any circumstances. TANSTAAFL. If you absolutely need
something implemented right now, please contact the developers to see if
they're available for contract work, or if perhaps a modest donation could
speed things along.)
TODO: IN THE WORKS
------------------
* Finish up administration interface for pre-generating static files for all
pages on the Drupal site in one go.
* Test interaction with other modules that also make use of ob_start().
TODO: FUTURE IDEAS
------------------
* Add a node-specific cache lifetime setting.
* Add admin-visible block for updating the cached copy of the current page.
* Other web servers than Apache are not supported at the moment. This is due
to the way the cache dispatch is implemented using Apache mod_rewrite
directives in the .htaccess file. Lighttpd support would be desirable but
is not a high priority for the developer at present.
* User-specific static page cache. Could conceivably be implemented based
on the existing user-cookie mechanism, though security would be a concern.
* Don't delete the entire file system cache when Boost is disabled; just
rename the site's cache directory with the suffix `.disabled', speeding up
cache regeneration once the module is re-enabled?
// $Id$
This lists some of the users of Boost, describing the setup of the website
in question as well as providing rationale on how Boost benefits the site.
It is hoped that the cases described here may serve as useful guides for new
Boost users evaluating how to best implement static caching on their site.
Stand Against Poverty <http://www.standagainstpoverty.org>
United Nations campaign website used to organize events and report event
attendance for events at which nearly 117 million people worldwide
participated. The events seek to raise awareness about poverty and
highlight effort around the United Nations Millennium Development Goals
that seek toward reducing global poverty. The website runs on a cluster
of 4 load-balanced Apache web servers and a single database server.
Boost is used to reduce the overall resource usage consumed by anonymous
visitors on the site in order to devote more infrastructure resources
toward event organizers who sign in as authenticated users to create
events and report event attendance tallies. The Stand Against Poverty
site is is 4 languages, and uses the i18n module. Boost was used with
this patch http://drupal.org/node/174380#comment-663794 to allow for the
use of Boost on i18n sites. While there is still a substantial amount of
traffic on the website during the 3 day campaign, the impact of
anonymous traffic (which includes all traffic, until users sign in) is
greatly reduced.
Hosting infrastructure for StandAgainstPoverty.org provided by the good
folks at http://www.advomatic.com
Development Seed writes more about the campaign on their blog at:
http://www.developmentseed.org/blog/2008/oct/22/united-nations-uses-drupal-huge-anti-poverty-event
http://www.developmentseed.org/blog/2008/oct/23/improving-drupals-performance-boost-module-uns-millennium-campaign
Environmental Working Group <http://www.ewg.org/>
The Environmental Working Group (EWG) uses the power of public
information to protect public health and the environment. Boost is used
to cache all public-facing pages on the site (13,000+ and counting) and
has been critical in sustaining EWG's large amounts of traffic since the
site relaunched using Drupal in early 2007. EWG frequently receives
traffic from multiple press outlets on a given day and Boost allows EWG
to manage its infrastructure in-house and at a fraction of the price
that would otherwise be required.
Arto Bendiken <http://bendiken.net/>
Personal website of the author. Boost is used to cache virtually every
page on the site, quite significantly improving response times despite
the sometimes sluggish shared hosting the site runs on. An additional
benefit provided by Boost is that when the backend MySQL database server
goes down, as happens now and then, the site still keeps on trucking
instead of just showing the Drupal database error page (dynamic features
such as posting comments obviously don't work until MySQL access is
restored, however).
If you would like to add your website to this list, please contact the
author at http://drupal.org/user/26089/contact, describing your site and
setup. Try to keep the description to a paragraph or two, and don't forget
to include your name and the URL to your website. Note that additions to
this list are posted at the author's sole discretion, and submissions may be
abridged or edited for grammar.
This diff is collapsed.
<?php
// $Id$
/**
* @file
* Implements the Boost API for static page caching.
*/
//////////////////////////////////////////////////////////////////////////////
// BOOST API
/**
* Determines whether a given Drupal page can be cached or not.
*
* To avoid potentially troublesome situations, the user login page is never
* cached, nor are any admin pages. At present, we also refuse to cache any
* RSS feeds provided by Drupal, since they would require special handling
* in the mod_rewrite ruleset as they shouldn't be sent out using the
* text/html content type.
*/
function boost_is_cacheable($path) {
$alias = drupal_get_path_alias($path);
$path = drupal_get_normal_path($path); // normalize path
// Never cache the basic user login/registration pages or any administration pages
if ($path == 'user' || preg_match('!^user/(login|register|password)!', $path) || preg_match('!^admin!', $path))
return FALSE;
// At present, RSS feeds are not cacheable due to content type restrictions
if ($path == 'rss.xml' || preg_match('!/feed$!', $path))
return FALSE;
// Don't cache comment reply pages
if (preg_match('!^comment/reply!', $path))
return FALSE;
// Match the user's cacheability settings against the path
if (BOOST_CACHEABILITY_OPTION == 2) {
$result = drupal_eval(BOOST_CACHEABILITY_PAGES);
return !empty($result);
}
$regexp = '/^('. preg_replace(array('/(\r\n?|\n)/', '/\\\\\*/', '/(^|\|)\\\\<front\\\\>($|\|)/'), array('|', '.*', '\1'. preg_quote(variable_get('site_frontpage', 'node'), '/') .'\2'), preg_quote(BOOST_CACHEABILITY_PAGES, '/')) .')$/';
return !(BOOST_CACHEABILITY_OPTION xor preg_match($regexp, $alias));
}
/**
* Determines whether a given Drupal page is currently cached or not.
*/
function boost_is_cached($path) {
$path = (empty($path) ? BOOST_FRONTPAGE : $path);
$alias = drupal_get_path_alias($path);
$path = drupal_get_normal_path($path); // normalize path
// TODO: also determine if alias/symlink exists?
return file_exists(boost_file_path($path));
}
/**
* Deletes all static files currently in the cache.
*/
function boost_cache_clear_all() {
clearstatcache();
if (($cache_dir = boost_cache_directory()) && file_exists($cache_dir)) {
return _boost_rmdir_rf($cache_dir);
}
}
/**
* Deletes all expired static files currently in the cache.
*/
function boost_cache_expire_all() {
clearstatcache();
if (($cache_dir = boost_cache_directory()) && file_exists($cache_dir)) {
_boost_rmdir_rf($cache_dir, 'boost_file_is_expired');
}
return TRUE;
}
/**
* Expires the static file cache for a given page, or multiple pages
* matching a wildcard.
*/
function boost_cache_expire($path, $wildcard = FALSE) {
// TODO: handle wildcard.
$alias = drupal_get_path_alias($path);
$path = drupal_get_normal_path($path); // normalize path
$filename = boost_file_path($path);
if (file_exists($filename))
@unlink($filename);
if ($alias != $path) {
$symlink = boost_file_path($alias);
if (is_link($symlink))
@unlink($symlink);
}
return TRUE;
}
/**
* Returns the cached contents of the specified page, if available.
*/
function boost_cache_get($path) {
$path = drupal_get_normal_path($path); // normalize path
$filename = boost_file_path($path);
if (file_exists($filename) && is_readable($filename))
return file_get_contents($filename);
return NULL;
}
/**
* Replaces the cached contents of the specified page, if stale.
*/
function boost_cache_set($path, $data = '') {
// Append the Boost footer with the current timestamp
$data = rtrim($data) . "\n" . str_replace('%date', date('Y-m-d H:i:s'), BOOST_BANNER);
// Execute the pre-process function if one has been defined
if (function_exists(BOOST_PRE_PROCESS_FUNCTION))
$data = call_user_func(BOOST_PRE_PROCESS_FUNCTION, $data);
$alias = drupal_get_path_alias($path);
$path = drupal_get_normal_path($path); // normalize path
// Create or update the static file as needed
$filename = boost_file_path($path);
_boost_mkdir_p(dirname($filename));
if (!file_exists($filename) || boost_file_is_expired($filename)) {
if (file_put_contents($filename, $data) === FALSE) {
watchdog('boost', t('Unable to write file: %file', array('%file' => $filename)), WATCHDOG_WARNING);
}
}
// If a URL alias is defined, create that as a symlink to the actual file
if ($alias != $path) {
$symlink = boost_file_path($alias);
_boost_mkdir_p(dirname($symlink));
if (!is_link($symlink) || realpath(readlink($symlink)) != realpath($filename)) {
if (file_exists($symlink))
@unlink($symlink);
if (!_boost_symlink($filename, $symlink)) {
watchdog('boost', t('Unable to create symlink: %link to %target', array('%link' => $symlink, '%target' => $filename)), WATCHDOG_WARNING);
}
}
}
return TRUE;
}
/**
* Returns the full directory path to the static file cache directory.
*/
function boost_cache_directory($user_id = 0, $host = NULL) {
global $user, $base_url;
$user_id = 0; //(!is_null($user_id) ? $user_id : BOOST_USER_ID);
$parts = parse_url($base_url);
$host = (!empty($host) ? $host : $parts['host']);
// FIXME: correctly handle Drupal subdirectory installations.
return implode('/', array(getcwd(), BOOST_FILE_PATH, $host, $user_id));
}
/**
* Returns the static file path for a Drupal page.
*/
function boost_file_path($path) {
if ($path == BOOST_FRONTPAGE)
$path = 'index'; // special handling for Drupal front page
return implode('/', array(boost_cache_directory(), $path)) . BOOST_FILE_EXTENSION;
}
/**
* Returns the age of a cached file, measured in seconds since it was last
* updated.
*/
function boost_file_get_age($filename) {
return time() - filemtime($filename);
}
/**
* Determines whether a cached file has expired, i.e. whether its age exceeds
* the maximum cache lifetime as defined by Drupal's system settings.
*/
function boost_file_is_expired($filename) {
if (is_link($filename))
return FALSE;
return boost_file_get_age($filename) > variable_get('cache_lifetime', 600);
}
//////////////////////////////////////////////////////////////////////////////
<?php
// $Id$
/**
* @file
* Actions for managing the static page cache provided by the Boost module.
*/
//////////////////////////////////////////////////////////////////////////////
/**
* Lists all files currently in the Boost static file system cache.
*/
function drush_boost_list() {
// TODO: implementation.
}
/**
* Expires all files, or all files matching a given path, from the static page cache.
*/
function drush_boost_expire($path = NULL) {
drush_op('boost_cache_expire', $path, TRUE);
if (DRUSH_VERBOSE) {
drush_print(empty($key) ? t('Boost static page cache fully cleared.') :
t("Boost cached pages like `%path' expired.", array('%path' => $path)));
}
}
/**
* Enables the Boost static page cache.
*/
function drush_boost_enable() {
drush_op('variable_set', 'boost', CACHE_ENABLED);
if (DRUSH_VERBOSE) {
drush_print(t('Boost static page cache enabled.'));
}
}
/**
* Disables the Boost static page cache.
*/
function drush_boost_disable() {
drush_op('variable_set', 'boost', CACHE_DISABLED);
if (DRUSH_VERBOSE) {
drush_print(t('Boost static page cache disabled.'));
}
}
//////////////////////////////////////////////////////////////////////////////
<?php
// $Id$
/**
* @file
* Various helper functions for the Boost module, to make life a bit easier.
*/
//////////////////////////////////////////////////////////////////////////////
/**
* Recursive version of mkdir(), compatible with PHP4.
*/
function _boost_mkdir_p($pathname, $mode = 0775, $recursive = TRUE) {
if (is_dir($pathname)) return TRUE;
if ($recursive && !_boost_mkdir_p(dirname($pathname), $mode)) return FALSE;
if ($result = @mkdir($pathname, $mode))
@chmod($pathname, $mode);
return $result;
}
/**
* Recursive version of rmdir(); use with extreme caution.
*
* @param $dirname
* the top-level directory that will be recursively removed
* @param $callback
* optional predicate function for determining if a file should be removed
*/
function _boost_rmdir_rf($dirname, $callback = NULL) {
$empty = TRUE; // Start with an optimistic mindset
foreach (glob($dirname . '/*', GLOB_NOSORT) as $file) {
if (is_dir($file)) {
if (!_boost_rmdir_rf($file, $callback))
$empty = FALSE;
}
else if (is_file($file)) {
if (function_exists($callback)) {
if (!$callback($file)) {
$empty = FALSE;
continue;
}
}
@unlink($file);
}
else {
$empty = FALSE; // it's probably a symbolic link
}
}
// The reason for this elaborate safeguard is that Drupal will log even
// warnings that should have been suppressed with the @ sign. Otherwise,
// we'd just rely on the FALSE return value from rmdir().
return ($empty && @rmdir($dirname));
}
/**
* Creates a symbolic link using a computed relative path where possible.
*/
function _boost_symlink($target, $link) {
if (!file_exists($target) || !file_exists(dirname($link)))
return FALSE;
$target = explode('/', $target);
$link = explode('/', $link);
// Only bother creating a relative link if the paths are in the same
// top-level directory; otherwise just symlink to the absolute path.
if ($target[1] == $link[1]) {
// Remove the common path prefix
$cwd = array();
while (count($target) > 0 && count($link) > 0 && reset($target) == reset($link)) {
$cwd[] = array_shift($target);
array_shift($link);
}
// Compute the required relative path
if (count($link) > 1)
$target = array_merge(array_fill(0, count($link) - 1, '..'), $target);
$link = array_merge($cwd, $link);
}
return symlink(implode('/', $target), implode('/', $link));
}
//////////////////////////////////////////////////////////////////////////////
// PHP4 COMPATIBILITY
if (!function_exists('file_put_contents')) {
function file_put_contents($filename, $data) {
if ($fp = fopen($filename, 'wb')) {
fwrite($fp, $data);
fclose($fp);
return filesize($filename);
}
return FALSE;
}
}
//////////////////////////////////////////////////////////////////////////////
; $Id$
name = Boost
description = Provides a performance and scalability boost through caching Drupal pages as static HTML files.
package = Caching
description = Caches text as static files
package = Performance and scalability
core = 7.x
files[] = boost.module
files[] = boost.admin.inc
configure = admin/config/development/performance/boost
......@@ -7,21 +7,43 @@
*/
//////////////////////////////////////////////////////////////////////////////
// Core API hooks
/**
* Implementation of hook_install(). Installs the current version of the database schema.
* Implementation of hook_enable().
*/
function boost_install() {
// Ensure that the module is loaded early in the bootstrap:
db_query("UPDATE {system} SET weight = -90 WHERE name = 'boost'");
function boost_enable() {
}
/**
* Implementation of hook_disable().
*/
function boost_disable() {
// Make sure that the static page cache is wiped when the module is disabled:
boost_flush_caches();
drupal_set_message(t('Static page cache cleared.'));
}
// Forcibly disable Drupal's built-in SQL caching to prevent any conflicts of interest:
if (variable_get('cache', CACHE_DISABLED) != CACHE_DISABLED) {
variable_set('cache', CACHE_DISABLED);
drupal_set_message(t('Drupal standard page caching disabled by Boost.'));
}
/**
* Implementation of hook_install().
*/
function boost_install() {
}
drupal_set_message(t('Boost module successfully installed.'));
/**
* Implementation of hook_uninstall().
*/
function boost_uninstall() {
// Clear variables
$name = 'boost_';
db_delete('variable')
->condition('name', db_like($name) . '%', 'LIKE')
->execute();
cache_clear_all('variables', 'cache_bootstrap');
}
//////////////////////////////////////////////////////////////////////////////
/**
* Implementation of hook_requirements().
*/
function boost_requirements($phase) {
}
This diff is collapsed.
#
# Apache/PHP/Drupal settings:
#
# Protect files and directories from prying eyes.
<FilesMatch "(\.(engine|inc|install|module|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root)$">
Order deny,allow
Deny from all
</FilesMatch>
# Set some options.
Options -Indexes
Options +FollowSymLinks
# Customized error messages.
ErrorDocument 404 /index.php
# Set the default handler.
DirectoryIndex index.php
# Override PHP settings. More in sites/default/settings.php
# but the following cannot be changed at runtime.
# PHP 4, Apache 1
<IfModule mod_php4.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
</IfModule>
# PHP 4, Apache 2
<IfModule sapi_apache2.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
</IfModule>
# PHP 5, Apache 1 and 2
<IfModule mod_php5.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
</IfModule>
# Reduce the time dynamically generated pages are cache-able.
<IfModule mod_expires.c>
ExpiresByType text/html A1
</IfModule>
# Various rewrite rules.
<IfModule mod_rewrite.c>
RewriteEngine on