> I was able to sign in to the AWS console and resolve the issue
Kids these days.
I had a RAM stick fry in one of the physical machines sitting in a colo 1 hour drive away. Not die, but just start flipping bits here and there, triggering most bizarre alerts you can imagine. On the night of December 24th. Now, that was fun.
--- To add ---
If you are a single founder - expect downtime and expect it to be stressful. Inhale, exhale, fix it, explain, apologize and then make changes to try and prevent it from happening again. Little by little, weak points will get fortified or eliminated and the risk of "incidents" will go down. There's no silver bullet, but with experience things become easier and less scary.
Live and learn is what I think the take away of this story is all about... I had a server fail dec 25 mid morning. It caused failures in away I had thought about before because instead appearing completely dead it was alive enough to not let go of any tcp connections. For the critical component in question, I didn’t have the correct timeouts in place... so as the single operator I was fortunate that my wife was also my co founder and so was a bit more understanding.
Kids these days.
I had a RAM stick fry in one of the physical machines sitting in a colo 1 hour drive away. Not die, but just start flipping bits here and there, triggering most bizarre alerts you can imagine. On the night of December 24th. Now, that was fun.
--- To add ---
If you are a single founder - expect downtime and expect it to be stressful. Inhale, exhale, fix it, explain, apologize and then make changes to try and prevent it from happening again. Little by little, weak points will get fortified or eliminated and the risk of "incidents" will go down. There's no silver bullet, but with experience things become easier and less scary.