The holy grail of network engineer is building a completely redundant network with no single point of failure, where outages are never seen by the end users and the network team is a happy upbeat group of individuals who never get blamed for anything. The problem adding redundancy is the added complexity needed. Sadly in the lust for more 9s of uptime you can build what I call the Rube Goldberg Network.
And a little Mythbusters example:
After watching this video, if you feel like Adam Savage when trying to talk about your network resiliency and redundancy, you may have a Rube Goldberg Network.
- When looking at solutions to provide redundancy, look for the simplest solution that provides what you need. SSO Sounds really awesome, but it’s also more complex to troubleshoot when things don’t work. Are you hitting a bug, or is something else wrong?
- Design within your teams skill level, don’t rely on technology your team doesn’t have a good understanding of. If you need that particular technology, you may need to get the team some training on how to make it work.
- Ask your Partner/Vendor/Twitter buddy what they think. “You ‘could’ do that, but maybe this would be a better idea.” My Coworkers joke that if I don’t like something I say “Hmmm, That’s Interesting.” Don’t give me that “I don’t trust my Partner/Vendor.” If you don’t trust them, why are they your partner/vendor?
- Use technology as it is intended. CCIE stupid router tricks are just that. Tricks. Don’t use tricks.
- Test. Test. Test. Mock it up in lab, test changes, repeatedly. Schedule Quarterly/Annual DR testing to make sure it all still works after it’s had months of change requests.
- Periodically review the decisions you’ve made, the outages you’ve had, and assess whether it’s working for you. Are the stakeholders happy with the uptime/stability/resiliency of the network?