My heads buzzing with an idea I had at work yesterday. OK - so your
web server goes down - that don't impress you much. Your
server monitor says "hey i'm down, it's 2AM come into work and fix it" you have a choice -
1. Come into work
2. Stay in bed / club /pub
I know I'd prefer "2" but it's not always pratical - especially if you have to meet uptime targets, or even if your paid to do that.
The point i'm trying to get at is, that if the server has informed me that the server is down, and needs my immediate attention - why can't it inform another peripheral device that it is their turn to take over for a short while.
The best, but rather expensive solution is to have two of every server. So if webserver "A" goes down webserver "B" takes over. As I say - expensive - but it will give you those 100% uptime figures you dream of.
Another solution could be have a single server as a backup for all your servers, so if one goes "pop" you can rebuild the server with a new OS. But this is time consuming and inpratical for a short period of time.
But if you had a quicker way of doing this (installing the OS), it would be perfect - AH - but we have - in the realm of live linux distributions - such as
Knoppix. I've recently been playing with knoppix and i've found it to be a great learning aid for
fairly new Linux users (me) and its great for fixing the old FAT file system based Windows distro's.
After being suitably impressed I have decided to buy a book on the subject with the thought of making a "live linux distribution backup" of a server - and therefore change the face of a backup disaster and recovery proceedure - as their will be a live CD that can be plugged straight in to take over a server.
My vision is: If a server goes down, your server monitor informs another system which boots your live distribution with a copy of your database / webserver / mailserver etc...