I keep noticing startups (and larger companies, too) making some really basic mistakes that end up leading to a lot of downtime. Here's my list of 6 really easy things you can do to avoid major downtime.

1. Buy backup DNS service
This is so cheap it's a no-brainer. For about $15/year you can get a service that will constantly grab your DNS data and act as a backup if they happen to go down. Otherwise, when your DNS servers go haywire (it's happened to me and I've seen it happen to many others), you'll be stuck helpless for a few hours as people are unable to get to your site. [I've used No-IP Squared Backup, and Chris has used Nettica].

2. Buy a monitoring service
For $5/month, you can purchase a service that pings your servers every few minutes and sends you a text message if they go down. This is absolutely crucial, especially if things go to hell in the middle of the night, or any other time you might not normally be checking. Make sure to buy a service that monitors from at least 3 locations -- there's nothing worse than a few false alarms in the middle of the night, after which you won't get up for the real thing. [I've been happy with WebSitePulse -- their prices are a bit more expensive now, but you should be able to get them down to $5/month on the phone.]

3. Always make database backups before touching the database
It's one of those things you always consider and dismiss right before you bring the whole thing crashing down. Especially if you aren't making very regular backups (note: you should be), make sure to do so before you get your hands dirty. (Ever forget the WHERE clause? Not fun...)

4. Be VERY careful around power cords at Colos
Knocking out a power cord seems to happen consistently if you don't make a very concerted effort not to. It's extremely easy for a cord to jiggle a tiny bit, or for one moving server to pull on another cord in just the right way. Always plan out your server trajectories before you move them, or have someone to hold the power cords in. [Note: Why aren't snap-in power cords standard for rack mountable servers??]

5. Make your site functional in pieces
Even if your database is down, there's no reason your home page shouldn't still show, or any of your other static pages. There's a big difference from a user's point of view in between an otherwise seemingly functional site that shows a nice looking error message, and a site that spits out errors, is not accessible, or won't load at all. If Weebly's database goes down, users will see a polite "Sorry, something is wrong and we're fixing it right now" error message. Our site, blog, and all hosted user sites stay up, so a database crash just means that people can't edit their sites at the moment.

6. Use source management to roll out updates
We use darcs to manage our different source repositories, and it's a flexible and distributed system that works very well for us. Whatever you use, make sure you use some automated process to roll out updates (which doesn't include moving a directory and moving another in it's place, and, God forbid, manually diff'ing files -- there's always more to that than you anticipate). It's quite shameful to see pretty basic sites go down for hours (or days, or weeks) rolling out an update. If you're using Weebly while we push out an update, your session will automatically be refreshed without any loss of data, and you'll be up and running on the new version within seconds (with no downtime). [Darcs can be found at http://darcs.net/]

Those 6 items combined have probably caused over 80% of downtime I've been responsible for. What's your list?

 


Comments

Sat, 25 Aug 2007 12:56:35

 

Sat, 25 Aug 2007 12:58:08

Mon.itor.Us (http://mon.itor.us) is free website monitoring service, monitoring from 3 different locations and sending IM, email notifications.

 

Mon, 27 Aug 2007 01:18:13

I really like this list, the 3rd tip about making a backup of the database before touching it, it's more easy to say than to do. Cron jobs can help also. Make a backup of the whole website regularly, specially if you work only on the server, change the whole code, and forget to back it up. I took it as an habit. Discipline is requested.

 

Mon, 27 Aug 2007 19:03:23

I don't understand a word of what you just typed.

 

Stephen Hebert

Tue, 28 Aug 2007 09:05:37

Not that I don't agree with #3 (because I do), but it helps to type the WHERE clause first when you're working live, just to eliminate the possibility that you might accidentally hit ENTER after typing the UPDATE or DELETE bit.

 

Sat, 15 Sep 2007 22:11:07

Good list. I can't believe the amount of downtime some people have. Blows me away. Anyway, good list of things to keep track of and be weary of.

 

Bob Clarke

Mon, 05 Nov 2007 10:17:33

This is a great list. You could probably
write a great step-by-step how-to book on doing a web start-up. There are a lot of newbees out there who have no idea about cost, logistics, or implementation -- why not give them some pointers?

 

Tue, 29 Jan 2008 05:28:27

ah.. I hate downtimes.. thanks for your post!!

 



Leave a Reply

Name (required)
Email (not published)
Website


 

    Author

    David Rusenko is a founder at Weebly, a company that makes a web creation tool that doesn't suck. He's also a part-time DJ and traveling enthusiast.

    RSS Feed


    Archives

    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    November 2007
    October 2007
    September 2007
    August 2007
    July 2007

    Categories

    All
    Raising Money
    Startups
    Day To Day
    San Francisco
    Misc
    Music
    Rant
    Product Reviews
    Open Source
    Scaling

    Blogroll

    Jessica Livingston
    Robby Walker
    Adam Smith

    Justin.tv
    Venture Hacks
    Uncrate

    My Flickr Photos