Hacker News new | past | comments | ask | show | jobs | submit login

Sounds like a terrible situation. I wish those guys luck.

One useful sys ops practice is the creation and yearly validation of disaster recovery runbooks. We have a validated catalog of runbooks that describe the recovery process for each part of our infrastructure. The validation process involves provoking a failure (eliminate a master database), running the documented recovery steps and then validating the result. The validation process is a lot easier if you're in the cloud since it's cheap and easy to set up a validation environment that mirrors your production env.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: