All shared web hosting, reseller hosting (which is the same), and VPS hosting , is based on stacking loads and loads of websites on a single physical server.

Server’s have become more and more reliable over the years, and it’s common for them to run more than a few years without crashing.

If a single server breaks, and your sites are on there, or your customers sites, its a world of pain. They all have spare and redundant hardware in that server, to make it as reliable as possible. But eventually the inevitable happens.

With a clustered solution, your site is duplicated to several different servers. wpdone distributes your sites to 3 different servers, in 3 different data centres. We might suffer the same hardware crash, but we’ll just turn off the effected server, and run your sites from a working server. We can then recover the server in peace.

The Problem

When one of these servers has a hardware crash, or has a software problem, or is hacked – the results can be dramatic. And there is a knock on effect, and generally it is 4-10 days to recover, and this is why:

  • the thousands of sites will take hours or days to recover from backup onto a new server
  • each account on there may need some love and care, like fixing a broken mysql database. So instead of a few support tickets, it might be a hundred times the number of support tickets. No one has 100 times the number of support staff to call in.
  • whilst the server is recovering backup data, fixing mysql databases, checking filesystems, it can’t really do anything else
  • the servers are pushed to CPU capacity on a normal day, even responsible hosts, it’s likely to be 70%. But there is also memory capacity, and disk IOPS capacity. So during the recovery period it is really going to suck mud.
    • you only need a few customers on the server, with millions of files, and the filesystem checking will be horrendous
    • then a few sites will be recovering their mysql backups will millions of rows
    • then everyone will be taking a panicked cpanel account backup
    • all the accounts wont be going into sleep mode, and giving up their resources as idle, its just 1 massive battle against all your neighbours
    • and it might only be normal web traffic coming back in, which again further hampers the recovery process.
  • you have no real idea what short cuts they’ve taken to get the number of accounts on that server in the first place. Full sync writes, all the way to the disk, is a huge problem from scaling lots of accounts on a single server. Full sync makes sure if there is a massive crash that once the server is turned back on everything will be in a tidy known and recoverable state. Without full sync there is a lot of corruption. I don’t have direct evidence of this, but just the sheer volume of recoveries needed leads me to suspect short cuts have been taken.
    • in particular with VPS disk writes, all need to be considered as meta-data, and all writes need to be full sync. But this becomes a bottleneck. So I am pretty sure short cuts are taken.

So the outcome is

  • ‘all hands on deck’ – the hosting guys are definitely giving you support
  • but there is so much waiting time for the recovery
  • and then the server is just slow to respond to everything, support requests, web requests, fixup procedures
  • it ends up taking 4-10 days to get every account sorted.

The evidence

The next thing is I am not ‘bashing’ other providers. It’s not naming and shaming, but without concrete real examples, it’s difficult to believe.

A few weeks ago, netvirtue here in Australia has some big problems on a single server, but the customers on there were upset:

screenshot-forums.whirlpool.net.au 2016-02-14 10-58-14 (1)

From the thread, you can see it went on for days, perhaps a week for some accounts. source :http://forums.whirlpool.net.au/archive/2495421

And the big outage of the week was Heart internet in the UK. These guys are down for 4 days, and counting. Again, some customer it will be a week of outages. Check out the guy in the youtube below, he is the best testamount of why single server hosting is bad.

heart internet server outage


The biggest difference in clustered hosting is how recoveries of disasters happen.

The Solution

Single server hosting, the recovery is slow, lots of support tickets, and it takes multiple days. There is a lot of sweat, swearing, customer calls, and long nights.

Clustered hosting recovers in a few seconds. Then we can recover the broken server in peace, without support tickets or customer calls,  in business hours, without pressure, and without receiving internet web hits (which just slows everything down as well).

