View Issue Details

IDProjectCategoryView StatusLast Update
1843Composrcorepublic2015-04-03 12:28
ReporterChris Graham Assigned ToChris Graham  
PrioritynormalSeverityfeature 
Status resolvedResolutionfixed 
Summary1843: Failover mode
DescriptionIf there's a failure on the front page, or performance has degraded badly, or the maximum number of online users is hit, failover mode activates automatically. This is managed by a CRON script that runs outside of Composr which can detect the situations (except the last case, which happens within Composr dynamically). It may also deactivate automatically.

On failover mode:
 - traffic is automatically delivered out of the guest/bot cache (so no logged in users)
  - this happens very early in initialisation, before any database connection is made
  - if there's a cache miss, it will say so, rather than trying a live version
 - login automatically disabled via a message (any login page hits will be mapped to an error screen)
 - a message is injected at the top and bottom of the screen saying there is a fault and no live updates will currently be saved
 - a "temporarily unavailable" HTTP status is used, so bots don't cache anything within failover mode

Prerequisites:
 - Failover mode is only available if the guest/bot cache is enabled

Failover mode is activated via a setting in info.php. It can have 3 values:
1) 1 (is on)
2) 0 (is off)
3) -1 (is off, but CRON is permitted to put it back to 1)

Failover mode can be bypassed via ?keep_failover=0 in the URL or forced with ?keep_failover=1

Document the behaviour of failsafe mode.
Document that CloudFlare's "Always On" DNS failover system can protect against the cases where you have a more severe hardware/network/DNS issue.
Document good free uptime monitoring tools.
TagsNo tags attached.
Attach Tags
Time estimation (hours)6
Sponsorship open

Sponsor

Date Added Member Amount Sponsored

Activities

OneRingRules

2015-03-25 23:48

reporter   ~2663

http://mxtoolbox.com/ has some good and usefull tools ...

On this solution what are we likely to see at the home page if a catastrophic php failure? Can that message be easily changed?

would or could parts of the site keep going i.e view only if the severity wasn't significant

Are there alerts to admins of events running?

Chris Graham

2015-03-26 00:03

administrator   ~2664

"On this solution what are we likely to see at the home page if a catastrophic php failure?"

If PHP itself is broken, this failover system wouldn't work, as it is still PHP driven. But that is an extremely rare event, much more likely is some issue in the database, or with Composr. This would be coded to kick-in very early in the Composr bootup sequence, before plausible failure points come up.

I designed it to piggy-back on the existing guest cache, which is already designed to load early for performance reasons.

That said, maybe we could entirely bypass PHP without much extra work, using the Apache RewriteMap feature:
http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
We'd just need to maintain the rewrite map, and make our failover script adjust .htaccess.

Points of failure that would not be covered include:
 - DNS failure (e.g. misconfiguration, or expiry, or failure of DNS server)
 - Basic server hardware failure (e.g. power failure, hard disk failure, some other physical fault)
 - Basic networking failure (e.g. network is down)
 - Basic low-level software failure (e.g. Apache fails, Linux crashes, that kind of thing)

However I think Cloudflare could solve the above to an extent. The situations CloudFlare can't handle, are well handled by the above solution (e.g. site works but is very slow, or site is returning an error but Cloudflare doesn't get that).

I just realised I didn't answer the question you intended to ask. "What would be seen", the actual site page, just a guest version of it that doesn't allow any interactions (e.g. forum posting, rating, comments, etc). So it's like a read-only lock-down.

"Can that message be easily changed?". Yeah okay, we can make that an info.php value.

"Are there alerts to admins of events running?", ah I forgot that bit. Yeah we need to make a configurable email target too.

Chris Graham

2015-03-26 00:05

administrator   ~2665

"That said, maybe we could entirely bypass PHP without much extra work" well for the failover we could, but actually if the CRON script is in PHP, that could itself fail. But it seems unlikely this would happen, someone would need to do something horrific like delete PHP on the server, I don't think that's so likely.

Chris Graham

2015-04-03 00:23

administrator   ~2678

This is now implemented (I haven't deployed for you though yet Wayne).

I'm posting the tutorial in my next reply.

Chris Graham

2015-04-03 00:23

administrator   ~2679

[title sub="Written by Chris Graham, ocProducts"]Composr Tutorial: Failover mode[/title]

Composr has a special failover mode that can be activated if the site goes down for various possible reasons.
It is controlled by a special standalone CRON script that is individually configured.

This is an advanced feature, you need to be comfortable manually configuring files and server configuration to use it.

[contents]decimal,lower-alpha[/contents]

[title="2"]How a site could go down[/title]

There are many ways a site could go down. Some are mitigated by failover mode, some are not.

Covered by failover mode:
 - More website hits than Composr can actually handle on your hardware (e.g. "slashdotting")
 - Database corruption that impacts your front page
 - Some kind of unexpected Composr-originating fault that impacts your front page

Not covered by failover mode:
 - Domain problems (e.g. your domain registration expires, your DNS becomes misconfigured, your DNS server is failing)
  - This is covered in the sense that failover-mode would activate and you'd be e-mailed (assuming you're target e-mail isn't down too), but the server could not be reached anyway so it would have no positive impact
  - Mitigation: read your domain expiry/renewal notices, be careful when re-configuring DNS, use a robust DNS provider like CloudFlare (definitely not GoDaddy)
 - Your hardware fails (e.g. server freezes, harddisk fails, machine catches fire)
  - Mitigation: choose a decent host that does regular backups and is around to quickly do a full restore if something goes very wrong
 - Network problems (e.g. network routing problems, network failure)
  - Mitigation: choose a decent host that has redundant networking
 - Server disappears from the Internet (e.g. a server admin decides to unplug your server, your web host goes out of business, maintenance, power failure)
  - Mitigation: choose a decent webhost that can afford to stay in business, doesn't force significant down-time for maintenance, and has backup powers)
 - A server admin wipes your website (e.g. an accident, bills not paid)
  - Mitigation: choose a decent webhost who has proper backups, responsiveness, and will give you forewarning about missed invoices
 - Operating system malfunction (e.g. failed system update, error in web server or PHP configuration)
  - Mitigation: choose a decent webhost who will spot and fix such issues very quickly
 - Transfer quota ("monthly bandwidth") exceeded
  - This problem is very unlikely, transfer quotas are very high nowadays
  - Mitigation: choose a decent webhost, and if you have high requirements monitor and plan capacity
 - Problems deeper within the Composr website (i.e. not impacting the front page)
  - This problem is less severe than your whole site going down
  - Mitigation: you can program the failover system to check multiple URLs
 - The failover mode CRON script becomes broken somehow, and you didn't notice
  - Having both a system failure, and failover failure, is definitely unlucky
  - Mitigation: HTTP requests to check your uptime are sent with a user-agent of 'Composr_failover_test'. You may look for this in your web logs to see it is still running.

In short, performance and Composr problems can usually be saved via failover mode, while external problems cannot. Most external problems are less likely though, as other people have direct responsibility for those areas, while you have direct responsibility for your own website.

You may want to consider using a service like Cloudflare (and/or multiple servers) if you are concerned about the reliability of your own server infrastructure. Cloudflare can cache page copies in a similar (but inferior) way to failover mode, so can be a nice extra safeguard. Problems at this level are much rarer than they once were however, and Cloudflare does have its own disadvantages (such as an extra point of failure, and extra latency). If 100% up-time is a critical requirement, you hopefully have the budget also to put in a really high-quality hosting infrastructure.

[title="2"]What failover mode does[/title]

Failover mode will serve a guest version of the site (i.e. no logins), with a message explaining that failover mode is active.
Any actions requiring database usage will not work (for example, forum posts). This is because failover mode activates before even a database connection is made (to maximise effectiveness).
Failover mode is an on-server solution, hence some of the discussion in the prior section.

[title="Basic assumptions/requirements[/title]

 - 'shell_exec' should work from PHP.
 - The PHP executable should be in the path.
 - Native PHP mail must be available
 - Certain assumptions may be made about HTML structure, so with heavily-modified themes you may need to alter the PHP code a bit.
 - The actual administrative messages are sent in English-only (generally the failover system does not, as a standalone system, have Composr's own translatability and themeing support).
 - Your server should not be firewalled from checking its own URLs
 - [tt]info.php[/tt] will be written to using search & replace to toggle failover mode on and off, so the setting within [tt]info.php[/tt] needs to be written in a non-obfuscated way
 - The base URL must be explicitly set within [tt]info.php[/tt] (this is normal & recommended anyway)
 - The guest cache must be enabled

[title="2"]Configuring[/title]

The bundled 'failover' addon must be installed.

There are a few aspects to configuring failover mode:
1) CRON script
2) Specific configuration in [tt]info.php[/tt]
2) Enabling the guest cache in [tt]info.php[/tt] (failover fallback is based off the guest cache)

Failover mode is activated/deactivated via the [tt]data/failover_script.php[/tt] CRON script. You need to set up CRON to execute this script very regularly, e.g. every minute.
This script should be set up in the same basic way as the [tt]data/cron_bridge.php[/tt] script Composr has. The failover script is separate as it is standalone to the rest of Composr (if you're wondering, this is why it has to be hand-configured outside of Composr).

[tt]info.php[/tt] should be given a new setting like:
[code]
$SITE_INFO['failover_mode']='auto_off';
$SITE_INFO['failover_message']='<div class="global_messages"><div class="box box___message"><div class="box_inner"><div class="global_message" role="alert"><img src="'.$SITE_INFO['base_url'].'/themes/default/images/messageicons/warn.png" alt="" /><span>We are currently experiencing some difficulties with our site. Logins and posting are temporarily disabled.</span></div></div></div></div>';
$SITE_INFO['failover_message_place_after']='</header>';
$SITE_INFO['failover_message_place_before']='<footer';
$SITE_INFO['failover_cache_miss_message']='We are currently experiencing some difficulties with our site. Unfortunately we don\'t have an offline version of this page available.';
$SITE_INFO['failover_loadtime_threshold']='5';
$SITE_INFO['failover_loadaverage_threshold']='5';
$SITE_INFO['failover_email_contact']='[email protected]';
$SITE_INFO['failover_check_urls']='index.php?page=start;forum/index.php?page=forumview';
[/code]

The possible values for the [tt]failover_mode[/tt] setting are:
 - 'off' (no failover mode)
 - 'on' (manually declare the site has failed and you want to keep it in failover mode)
 - '[b]auto_off[/b]' (the [tt]failover_script.php[/tt] script is allowed to set it to 'auto_on' if it detects the site is failing)
 - 'auto_on' (the [tt]failover_script.php[/tt] script is allowed to set it to 'auto_off' if it detects the site is no longer failing)

[tt]failover_loadtime_threshold[/tt] specifies the minimum time in seconds that the front page must load in.

[tt]failover_loadaverage_threshold[/tt] specifies the minimum [tt]load-average[/tt] number that will trigger failover mode. This will only work on systems that support the 'uptime' command (e.g. Linux), or queries via COM (possibly Windows, depending on PHP configuration). Don't bother trying to work out what the numbers mean, tune them based on system norms. A higher number means a higher server CPU load.
There are no checks for I/O or memory bottlenecks, but these tend to reflect in the CPU load, or at least the page load-time.

Set [tt]failover_email_contact[/tt] to the e-mail address you wish to receive alerts on (when failover is automatically activated or deactivated). If you need to mail multiple people, separate using semicolons ([tt];[/tt]). Note that there are services to change e-mails to SMS messages, and you can usually configure smartphones to do notifications on e-mails matching certain patterns.

The [tt]failover_check_urls[/tt] setting lets you define multiple URLs (separated by a semicolon) to check (for failing HTTP statuses, or long load-times). These URLs will all be looked up beneath the base URL.

To enable guest cache add this to [tt]info.php[/tt]:
[code]
$SITE_INFO['fast_spider_cache']=3;
$SITE_INFO['any_guest_cached_too']='1';
[/code]

[title="3"]RewriteMap (advanced)[/title]

[url="RewriteMap"]http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html[/url] is a special Apache-only feature for bypassing PHP entirely during failover-mode.

To use this setting you need to have access to manually edit your main Apache configuration.
[quote="Apache manual"]
The RewriteMap directive may not be used in <Directory> sections or .htaccess files. You must declare the map in server or virtualhost context.
[/quote]
The config lines added will look something like this:
[code]
RewriteMap failover_mode txt:/home/someuser/public_html/data_custom/failover_rewritemap.txt
RewriteMap failover_mode__mobile txt:/home/someuser/public_html/data_custom/failover_rewritemap__mobile.txt
[/code]

The [tt]failover_apache_rewritemap_file[/tt] setting in [tt]info.php[/tt] defines a pattern (regular expression) of URLs that should be put into the RewriteMap file during static cache population.

[code]
$SITE_INFO['failover_apache_rewritemap_file']='((site/)?index\.php\?page=\w+(&type=\w+)?)|((site/)?pg/\w+(/\w+)?)|((site/)?\w+(/\w+)?\.htm)';
[/code]

Don't include too much in the pattern, or the file will get very large and inefficient for Apache to process and Composr to maintain -- just put your core URLs in. The above example is for all sub-ID-level pages in the Composr welcome and site zones.

If left empty, there will be no RewriteMap used. If set, the [tt].htaccess[/tt] file will automatically have the RewriteMap enabled/disabled along with failover-mode.

You must make sure the [tt]data_custom/failover_rewritemap.txt[/tt] and [tt]data_custom/failover_rewritemap__mobile.txt[/tt] files are writable by the web server.

[title="2"]Monitoring up-time generally[/title]

For extra peace of mind you may wish to set up an uptime monitor such as [url="Uptime Robot"]https://uptimerobot.com/[/url].
This will help warn you about the "Not covered by failover mode" situations described above.

[concepts
 1_key="Failover" 1_value="When a site automatically goes into a fallback mode when a problem happens"
 2_key="Slashdotting" 2_value="When a site is getting more hits that it can take. Named after the popular slashdot.org site, which at peak popularity would often knock out the sites that their headline stories linked to"
]Concepts[/concepts]

[title="2"]See also[/title]

 - [page="_SEARCH:tut_disaster"]Disaster recovery[/page]
 - [page="_SEARCH:tut_sql"]Manually editing your database with phpMyAdmin[/page]
 - [page="_SEARCH:tut_configuration"]Basic Configuration[/page]
 - [page="_SEARCH:tut_adv_configuration"]Advanced Configuration[/page]
 - [page="_SEARCH:tut_optimisation"]Optimisation[/page]

Chris Graham

2015-04-03 12:28

administrator   ~2680

I decided to make a few more improvements, mainly...

1) If it is detecting failover mode is now not needed, it will actually sit around for a minute checking it doesn't need to put it quickly back on again. This is to stop it flittering on/off/on/off each time CRON runs, due to failover mode having temporarily alleviated performance issues such as to give a false "now it's okay" kind of signal.

2) You can now use ?keep_failover=0 to force failover mode off for a session. This is useful when forcing things to get cached into the static cache manually, when you still want all other hits to come out of the static cache due to a hit flood -- or when testing if things are safe to manually turn on after manually turned off.

3) An alternative to the RewriteMap that doesn't require Apache config access is now implemented. It does require PHP to continue to work, but Composr can be in an even more broken state, i.e. the first couple of Composr files don't even need to be able to bootstrap.
This is actually a really convenient alternative to closing a site when upgrading it, as you can totally trash Composr and still have things running.

Issue History

Date Modified Username Field Change