I am one of the maintainers of https://opensource.zalando.com/skipper http proxy library, which can support similar cases. We use this at Zalando https://www.zalando.com/ in Kubernetes and allow developers to connect to different kind of data applications including chat based LLMs or notebooks. We have of course OTel/Opentracing support https://opensource.zalando.com/skipper/operation/operation/#....
Likely the comparison with lb algorithms round robin and least connections is not a fair choice. Better would be to compare with consistent hash, that naturally does stateful load balancing. In skipper you can tune the behavior by filters https://opensource.zalando.com/skipper/reference/filters/#co... and https://opensource.zalando.com/skipper/reference/filters/#co... per route.
You don't want auto scaling? You can also limit concurrent requests to a route with queue support and make sure backends are not overloaded using scheduler filters https://opensource.zalando.com/skipper/reference/filters/#sc....
If you need more you can also help yourself and use lua filters to influence these options https://opensource.zalando.com/skipper/reference/scripts/ .
We are happy to hear from you, Sandor
I am one of the maintainers of https://opensource.zalando.com/skipper http proxy library, which can support similar cases. We use this at Zalando https://www.zalando.com/ in Kubernetes and allow developers to connect to different kind of data applications including chat based LLMs or notebooks. We have of course OTel/Opentracing support https://opensource.zalando.com/skipper/operation/operation/#....
Likely the comparison with lb algorithms round robin and least connections is not a fair choice. Better would be to compare with consistent hash, that naturally does stateful load balancing. In skipper you can tune the behavior by filters https://opensource.zalando.com/skipper/reference/filters/#co... and https://opensource.zalando.com/skipper/reference/filters/#co... per route.
You don't want auto scaling? You can also limit concurrent requests to a route with queue support and make sure backends are not overloaded using scheduler filters https://opensource.zalando.com/skipper/reference/filters/#sc....
If you need more you can also help yourself and use lua filters to influence these options https://opensource.zalando.com/skipper/reference/scripts/ .
We are happy to hear from you, Sandor