Search this blog

Friday, April 30, 2010

Tips for Handling Events, Incidents, Outages, and Maintenance

I get a lot of questions from new service teams about what they should do to prevent downtime but very few people ask for advice on how to handle an incident. This is a bit like asking a boxer for the best way to avoid getting in the ring. It’s not a question of “if” you’re going to be in the ring but “when”. There’s an old saying – the more you bleed in the gym, the less you bleed in the ring and that definitely applies to incident management as well.

Having sat in on more war rooms than I’d like to remember, I thought it might be handy to write down some of the things that my team has found useful over the years. I think every service organization should have a standard approach towards three specific activities:

1.    Tips for Handling Service Incidents (just one service)
2.    Tips for Handling Service Outages (multiple services affected)
3.    Tips for Handling System Maintenance

I hope these posts help you with your handling of incidents, outages, and maintenance. Success here is mostly about being prepared, being calm, good communication, and practice, practice, practice. If you think your service is bullet-proof and you won’t need the practice – you’re wrong :-)

No comments:

Post a Comment

Thoughts?