I get a lot of questions from new service teams about what they should do to prevent downtime but very few people ask for advice on how to handle an incident. This is a bit like asking a boxer for the best way to avoid getting in the ring. It’s not a question of “if” you’re going to be in the ring but “when”. There’s an old saying – the more you bleed in the gym, the less you bleed in the ring and that definitely applies to incident management as well.
Having sat in on more war rooms than I’d like to remember, I thought it might be handy to write down some of the things that my team has found useful over the years. I think every service organization should have a standard approach towards three specific activities:
1. Tips for Handling Service Incidents (just one service)
2. Tips for Handling Service Outages (multiple services affected)
3. Tips for Handling System Maintenance
I hope these posts help you with your handling of incidents, outages, and maintenance. Success here is mostly about being prepared, being calm, good communication, and practice, practice, practice. If you think your service is bullet-proof and you won’t need the practice – you’re wrong :-)
What is a Green Data Center?
-
The topic of Green Data Center is something I have written for on a long
time. Then I tired of it. It is time to start writing again after a long
break.
...
9 months ago
No comments:
Post a Comment
Thoughts?