Hacker News new | past | comments | ask | show | jobs | submit login

Not before each change but before any major new initiative or refactor. Having those numbers up front is the only way to make appropriate trade offs.

Having this conversation with stakeholders often educates them on the costs of performance as well. Getting 100% of responses sub 200ms is frequently orders of magnitude more expensive than getting 99% of them there, and stakeholders usually get that fast when you show them budget info.




Replying to the second paragraph, there often is a real value in maintaining strong upper bound for the latency, especially in distributed real-time systems (which are most of the real systems, anyway).

E.g. (99% sub-200ms and 1% _unbounded_) vs (80% sub-200ms and _always_ sub-500ms) means 1% of potentially unanticipated crashes (a hell to debug and explain to customers!) vs a highly reliable system and happy customers.


For sure, thats the definition of a real time system after all. But having conversations about what the long tails do to the "normal" path and what the costs (both in money and performance in the "normal" path) is quite simply something that you can't back into.


Maybe I didn't understand you correctly, but you can at least have a "return error on timeout" and process that with a predictable logic. Or maybe you do have an architecture when any individual tardy request absolutely cannot impact others. After all, I come from stream processing systems where there's only few "users" with constant streams of requests, and these users are interdependent (think control modules in a self-driving car).


What I'm suggesting is the decision on what you do in the case of long tail performance problems, is not something you can back into.

If you are going to have timeouts with logic, that has down stream implications. If you are going to have truly independent event loops, that is a fundamental architectural decisions.

None of those things match the "make it work, then make it fast". You literally have to design that into the system from jump street as it is part of the definition of "works".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: