I get a lot of questions from new service teams about what they should do to prevent downtime but very few people ask for advice on how to handle an incident. This is a bit like asking a boxer for the best way to avoid getting in the ring. It’s not a question of “if” you’re going to be in the ring but “when”. There’s an old saying – the more you bleed in the gym, the less you bleed in the ring and that definitely applies to incident management as well.
Having sat in on more war rooms than I’d like to remember, I thought it might be handy to write down some of the things that my team has found useful over the years. I think every service organization should have a standard approach towards three specific activities:
1. Tips for Handling Service Incidents (just one service)
2. Tips for Handling Service Outages (multiple services affected)
3. Tips for Handling System Maintenance
I hope these posts help you with your handling of incidents, outages, and maintenance. Success here is mostly about being prepared, being calm, good communication, and practice, practice, practice. If you think your service is bullet-proof and you won’t need the practice – you’re wrong :-)
Why Acoustics Became My Path to Solving Hard Problems
-
When you’re trying to solve a hard problem, sometimes the only way forward
is to take a completely different path. For most of my career, I worked in
the...
1 day ago