I keep noticing startups (and larger companies, too) making some really basic mistakes that end up leading to a lot of downtime. Here's my list of 6 really easy things you can do to avoid major downtime.
1. Buy backup DNS service
This is so cheap it's a no-brainer. For about $15/year you can get a service that will constantly grab your DNS data and act as a backup if they happen to go down. Otherwise, when your DNS servers go haywire (it's happened to me and I've seen it happen to many others), you'll be stuck helpless for a few hours as people are unable to get to your site. [I've used No-IP Squared Backup, and Chris has used Nettica].
2. Buy a monitoring service
For $5/month, you can purchase a service that pings your servers every few minutes and sends you a text message if they go down. This is absolutely crucial, especially if things go to hell in the middle of the night, or any other time you might not normally be checking. Make sure to buy a service that monitors from at least 3 locations -- there's nothing worse than a few false alarms in the middle of the night, after which you won't get up for the real thing. [I've been happy with WebSitePulse -- their prices are a bit more expensive now, but you should be able to get them down to $5/month on the phone.]
3. Always make database backups before touching the database
It's one of those things you always consider and dismiss right before you bring the whole thing crashing down. Especially if you aren't making very regular backups (note: you should be), make sure to do so before you get your hands dirty. (Ever forget the WHERE clause? Not fun...)
4. Be VERY careful around power cords at Colos
Knocking out a power cord seems to happen consistently if you don't make a very concerted effort not to. It's extremely easy for a cord to jiggle a tiny bit, or for one moving server to pull on another cord in just the right way. Always plan out your server trajectories before you move them, or have someone to hold the power cords in. [Note: Why aren't snap-in power cords standard for rack mountable servers??]
5. Make your site functional in pieces
Even if your database is down, there's no reason your home page shouldn't still show, or any of your other static pages. There's a big difference from a user's point of view in between an otherwise seemingly functional site that shows a nice looking error message, and a site that spits out errors, is not accessible, or won't load at all. If Weebly's database goes down, users will see a polite "Sorry, something is wrong and we're fixing it right now" error message. Our site, blog, and all hosted user sites stay up, so a database crash just means that people can't edit their sites at the moment.
6. Use source management to roll out updates
We use darcs to manage our different source repositories, and it's a flexible and distributed system that works very well for us. Whatever you use, make sure you use some automated process to roll out updates (which doesn't include moving a directory and moving another in it's place, and, God forbid, manually diff'ing files -- there's always more to that than you anticipate). It's quite shameful to see pretty basic sites go down for hours (or days, or weeks) rolling out an update. If you're using Weebly while we push out an update, your session will automatically be refreshed without any loss of data, and you'll be up and running on the new version within seconds (with no downtime). [Darcs can be found at http://darcs.net/]
Those 6 items combined have probably caused over 80% of downtime I've been responsible for. What's your list?