Hacker News new | past | comments | ask | show | jobs | submit login

Glib answer: Don't work alone!

There's two ways to think about this:

1 - Your product might actually be too complex for a single-person business. You could rotate being on call for situations like this. This means that you'd have to make sure that sales are big enough to support an additional parter or two.

2 - Perhaps you need to simplify your product? Think more critically about error handling? I don't know the details about this part of your service, but if I assume that these bad UUIDs came from HTTP POSTs, why does a series of wonky HTTP posts bring down your entire service? Typically, something like this would trigger some kind of unhandled error that's caught higher up in your web framework and returns some kind of 5xx error.

This paragraph is very C# centric, but it should translate to other languages as well: Typically, I layer my error handling. Each operation is wrapped in a general exception handler that catches EVERYTHING and has some very basic logging. (ASP.Net does this and returns a 5xx error if your code has an unhandled exception.) Furthermore, as I get closer to actual operations that can fail, I catch exceptions that I can anticipate. Finally, I have basic sanity checks for things like making sure a string is really a UUID.

Without knowing much of your service's architecture, it just sounds like you need some high-level error handling. You probably have 100s of other little weird bugs, so high level error handling needs to do the equivalent of returning a 5xx error and logging, so you can fix it when you're able to.




My point was less about the specific issue I hit and more that 1) external circumstances that a restart won't resolve can cause failures, because 2) we're human and no matter how hard we try, even with a large team, things do slip through.

The difference with having a large team is less that all possible failure cases will get protected against (although more eyes and code review does help), but more that someone can always be available to fix it when something unexpected happens.

In my particular case, the majority of the system kept running fine. The part that failed was a streaming system which receives updates in realtime from an external system. The error actually was localised to one particular type of updates, but that type stopped working because I didn't protect defensively enough against errors in that one particular case (I do have my database queries protected against errors, but this one slipped through). This caused other systems to not get these updates, so things that relied on them stopped working. Its not that they crashed, they just never received the updates they were waiting for.

Of course the fix is to trap all exceptions, log/notify, ignore and continue, so that at least one piece of bad update doesn't affect other updates, but again, my main point was that we're human, so can't possibly protect against everything that might cause a non-recoverable (without human intervention) error.

> Finally, I have basic sanity checks for things like making sure a string is really a UUID

Yes, I did add this too after I hit this issue and its a good point: validate EVERYTHING even if you generate it and think you can assume it will be good.

> Don't work alone!

That's the real solution, but sometimes its not possible.

Thanks for your detailed response, though, its appreciated.


> In my particular case, the majority of the system kept running fine. The part that failed was a streaming system which receives updates in realtime from an external system.

Is your product too complicated for a single-person business?

As a solo programmer, I can write and develop extremely complicated systems. These systems can be so complicated that I don't have time to run them, find customers, support customers, ect.

That, ultimately, is why I don't see myself running a single-person business anytime soon. I really enjoy complicated programming, and if I have to also handle ops, support, sales, ect, then what I program needs to be too simple to remain interesting.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: