Launch HN: Odigos (YC W23) – Instant distributed tracing for Kubernetes clusters

cube2222 · on Jan 19, 2023

Wow, if this really works like you describe, then this is magic!

> Automatic instrumentation across programming languages is not a trivial task, especially when dealing with static binaries (like the ones produced by the Go compiler). We built multiple mechanisms to make sure we inject the relevant headers in a secure and stable way. We developed a system that tracks functions and structs across different versions of open-source libraries.

Could be very useful for non-greenfield projects. I'd love to learn more about the details, is there any writeup somewhere?

Though I'd still recommend new projects do "proper" tracing with not only one-per-service spans, but also spans for important functions, including additional application-specific tags, as that is easily 10x the value.

But since life is a sequence of tradeoffs, I think this project could be really useful in a lot of places.

phillipcarter · on Jan 19, 2023

> Though I'd still recommend new projects do "proper" tracing with not only one-per-service spans, but also spans for important functions, including additional application-specific tags, as that is easily 10x the value.

FWIW Odigos makes this possible because it uses OpenTelemetry (and generates OTel-compatible instrumentation for the eBPF-sourced data). You can go into an app that's instrumented this way, add an OpenTelemetry SDK, and start writing manual instrumentation or include additional instrumentation libraries. Your traces will just get deeper/richer when you do that.

edenfed · on Jan 19, 2023

We are actually doing technical deep dive on the next meeting of the OpenTelemetry Go auto instrumentation meeting in Tuesday. Will be happy to share the presentation afterwards.

In addition, we automatically create spans for popular open source libraries in use so you should also expect to see spans for database connections / cloud SDKs/ Kafka clients / etc. Definitely agree that manual instrumentation is very important in addition to the automatic one

intelVISA · on Jan 20, 2023

With tech like eBPF dynamic instrumentation is surprisingly easy actually.

Still, always glad to see some innovation in this space.

mdaniel · on Jan 20, 2023

Congratulations on the launch, and thank you for choosing an awesome license!

For an unrelated reason, today I was reminded about Pixie (https://news.ycombinator.com/item?id=25375170 and https://news.ycombinator.com/item?id=31687978 and https://github.com/pixie-io/pixie#readme ), which says is also an ebpf kubernetes observability tool, also Apache licensed.

I suspect the difference may be your aspirations to move out of just kubernetes, but I wondered if that's the biggest difference between your project and theirs? Or maybe the C++ versus golang?

edenfed · on Jan 20, 2023

As far as I know Pixie use eBPF for generating metrics. Odigos is focused on generating distributed traces which is a different signal that spans across multiple applications

thorgaardian · on Jan 19, 2023

Looks awesome! I hadn't had the chance to dive into eBPF yet, but I had hoped someone would be able to use it in a clever way like this!

I was digging through the docs and it looks like you have custom language detection. Did you consider trying to extract the language detection features from buildpack to do this? I imagine you'd get more reliable results and less to maintain if you used that as the basis.

edenfed · on Jan 19, 2023

Yes we are actually using a combination of env vars / process names / linked libraries and container metadata to detect the language

yashap · on Jan 20, 2023

Very cool!

I'd imagine the challenge here is the long tail of tracing and metric needs. I'm thinking things like:

- For the JVM, do you support things like thread pools and execution contexts well? e.g. say part of serving up a response to an HTTP request means executing some async work against an execution context, does the context propagation work properly? And if so, would this work for other JVM languages, like Scala, or just Java? When I've manually instrumented apps for context propagation, it's been easy for languages like JS (Node) and PHP, but hard for languages like Scala, where there are so many different concurrency models ppl use

- Some units of work/tracing are pretty standardized, like say serving up a response to an HTTP request. But others less so, for example work triggered by job queues/events, where essentially a message on some sort of Kafka/Redis/Postgres/whatever queue triggers your app to do some work (instead of an HTTP request). I have trouble seeing how Odigos would instrument this well - e.g. even if you detect the work, how do you label related metrics well (can't just rely on HTTP method/path)? How do you measure success/failure of the job? Or if you don't try to tackle this sort of use case, would there be something like Odigos libs for manual instrumentation, where necessary?

edenfed · on Jan 20, 2023

We are actually able to handle the long tail of tracing by leveraging the amazing open source community. For languages like Java we use the automatic instrumentation created by the OpenTelemetry community which is really great and support ton of libraries, you can see a list of supported libraries here: https://github.com/open-telemetry/opentelemetry-java-instrum... This also allows us to support async tracing like doing context propagation over Kafka message is also something we support (depending on the programming language)

yashap · on Jan 20, 2023

Ah cool! So like, if I used some Open Telemetry libs for more manual instrumentation, would it "play nice" with the automated instrumentation? Like say:

- I instrument a Scala app with Odigos, and it handles say 90% of the metrics, trace spans, etc. that I want

- But I want to add some extra spans, extra metrics

- If I then explicitly add OpenTelemetry libs as dependencies, will they conflict with the automated OpenTelemetry instrumentation (e.g. no "JVM dependency hell" issues like "I manually add 2.x of this lib, but then Odigos monkey patches it to 1.x, breaking my manual instrumentation")? And is there a way for the manual instrumentation to "play nice" with the automated instrumentation, e.g. I choose destinations in the Odigos UI for where to send traces, metrics, etc., is there a way for me to sort of have my manual instrumentation automatically target the same destinations?

Obviously you guys are an early stage startup, if there's no clear answer on hand for some of these questions, I'd just have to try and see, that's totally fair too :) I do love this idea of crazy easy 1-click style instrumentation.

edenfed · on Jan 20, 2023

Yes exactly. Odigos plays nice with manual instrumentation, meaning distributed traces will include both automatic and manual spans. Currently there is no way to point manual instrumentation to the destination selected in Odigos but we working on it and should release it soon. Most SDKs have a concept of no-op exporter that way Odigos will be able to pick up the manually created traces and deliver them to the chosen backend

yashap · on Jan 20, 2023

Very cool, ty for the responses!

jedberg · on Jan 19, 2023

This is awesome! Request tracing is basically the fundamental building block to observability in a distributed system.

Doing it automatically is a huge win!

Congrats on the launch and I look forward to learning more!

edenfed · on Jan 20, 2023

Thank you for the feedback! We believe a lot of innovation can happen with distributed traces, and Odigos is just the beginning

stavros · on Jan 19, 2023

I am very amused by your choice of name, as Odigos is to land what Kubernetes is to sea.

yasuocidal · on Jan 20, 2023

in greek it means Driver, the one who drives lol Edit: just saw your username, im pretty sure you already knew xD

stavros · on Jan 20, 2023

Haha yes, and Kubernetes is the guy who drives ships.

jzelinskie · on Jan 19, 2023

Congrats on the launch! OpenTelemetry/Distributed Tracing has been in dire need of quality of life improvements, so I'm glad to see more folks filling in the gaps.

I see you're injecting trace IDs into programs. How do you guarantee that this doesn't break the binary or flag any security/compliance requirements?

edenfed · on Jan 19, 2023

This is something we are thinking about a lot. We developed multiple mechanisms to make sure we inject the IDs in a safe way. You can see the code here: https://github.com/keyval-dev/opentelemetry-go-instrumentati...

phillipcarter · on Jan 19, 2023

> dire need of quality of life improvements

Agreed! I'm one of the maintainers of part of the project - what sorts of things are top of mind for you w.r.t. quality of life improvements?

Benjamin_Dobell · on Jan 19, 2023

This is really cool. Upon further Googling, readers may be interested in https://kubernetes.io/blog/2017/12/using-ebpf-in-kubernetes/

If you can go beyond Kubernetes, I think that'd give Odigos more staying power. Naturally some integrations are out of your hands, AWS Fargate being one (https://github.com/aws/containers-roadmap/issues/1027). However, if you could get integrations up and running with the likes of Fargate, Fly.io, Render.com etc. That'd be amazing.

edenfed · on Jan 19, 2023

Support for non-Kubernetes environments is something we are planning to release very soon.

william-evans · on Jan 19, 2023

This is really cool - given my perception of the target market it might be worth targeting AWS Elastic Container Service (ECS) next as the userbase there, I would imagine, is generally looking for less-complex solutions (given the complexity difference between Kubernetes and ECS).

edenfed · on Jan 19, 2023

ECS is definitely on our roadmap!

avinassh · on Jan 22, 2023

I was thinking of giving it a try, but why does it have Datadog as a prerequisite

> A Datadog account with API key. Go to Datadog website to create a new free account. In addition, create a new API key by navigating to Organization settings, then click on API keys, and create a new key.

https://docs.odigos.io/prerequisites

tecleandor · on Jan 23, 2023

I thin that prerequisite is only for that tutorial.

If I understood correctly, Odigos supports a bunch of observability backends and, instead of Datadog, you could use Jaeger, Splunk or Open Telemetry (for example).

https://github.com/keyval-dev/odigos/blob/main/DESTINATIONS....

edenfed · on Jan 23, 2023

This is not a requirement, sorry for the misleading documentation. We just rewritten everything and this bullet is probably a leftover from previous version of the docs. fixing it now.

nate908 · on Jan 19, 2023

Interesting, it looks like you've put some hard work into this project. My question is, what if a pod has multiple containers in it? How does Odigos choose which icon/programming language that is displayed for the pod? For example, I have a Deployment that runs pods with two containers: a php-fpm container and a nginx container. Would the "Choose Target Applications" page show an icon for both Nginx and PHP for the given Deployment? Would Odigos report separate metrics to the backend Desination for both PHP and Nginx?

edenfed · on Jan 19, 2023

Odigos will be able to instrument both containers each with the relevant instrumentation. As you pointed out, there is currently a bug in the UI that shows just one programming language per pod. Working on fixing it soon

rapidlua · on Jan 20, 2023

The BPF instrumentation is quite cool! I wonder if uprobes have a performance impact. Does it roughly compare to a single syscall?

https://github.com/keyval-dev/opentelemetry-go-instrumentati...

Bayart · on Jan 20, 2023

Really nice ! I was looking into implementing tracing for a few projects I'm being onboarded on, and it seems to solve the "ask the devs nicely to implement OpenTelemetry are at least merge my commits" part.

My "gut instinct" would be to export that to Jaeger, but I'm open to suggestions as to better alternatives. We're on GCP so it might be an opportunity to try Google Cloud Trace as well.

massimosgrelli · on Jan 20, 2023

Amazing project. Is everything open source? Are you planning anything for the big enterprise who wants to pay for the service?

edenfed · on Jan 20, 2023

Thank you! Yes, we are working on adding enterprise features.

ygouzerh · on Jan 20, 2023

Wow congrats guys, that is a game changer ! That have the possibility of becoming a standard in some companies I worked with

earthling8118 · on Jan 20, 2023

Are there any plans to branch out past Kubernetes? I'd be very interested in Odigos but I have separated myself from Kubernetes and am all in on Nomad. I've been looking at how I want to handle tracing and telemetry and this seems it'd be a great fit except for that minor detail

edenfed · on Jan 20, 2023

Definitely. Nomad is probably the first environment we are going to support after Kubernetes

Ancient · on Jan 19, 2023

Just saw the demo video, looks awesome. Is this tool from the future or some dark wizard tricks? Keep up the great work.

hinkley · on Jan 19, 2023

Distributed tracing really ought to be built into every web application framework. What's the value in signing over your autonomy to a framework if it isn't going to handle cross-cutting concerns like forwarding correlation IDs from the inbound request to all outbound requests triggered by that request?

edenfed · on Jan 19, 2023

Unfortunately not all web frameworks do this automatically. In addition sometimes you may want to propagate ID over non http connections like database drivers or even message queues.

decisionSniper · on Jan 19, 2023

I'm curious how this will stack up against Sysdig/Falco - https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf....

eBPF for the win, this is a nice approach with Odigos.

debarshri · on Jan 19, 2023

I don't think it is comparable to falco. Talk is more about security violations of the container. It is not related to distributed tracing.

edenfed · on Jan 19, 2023

Falco is really cool project but it focuses more on security. Odigos is focused on getting better monitoring signals from your applications, especially distributed tracing

theptip · on Jan 19, 2023

Looks cool! Great to see entrants into this space.

How does this compare with Cilium? Looks like they do OT tracing (https://github.com/cilium/hubble-otel) but it's not native/core, is that the main distinction?

edenfed · on Jan 19, 2023

As far as I know cilium does not do automatic context propagation and require code changes to achieve it. Odigos automatically do context propagation

Gary_TheSnail_ · on Jan 20, 2023

Are there available positions for hire in the company? This sounds really interesting!

edenfed · on Jan 20, 2023

Not yet, but hopefully soon. Thank you for the feedback!

pranay01 · on Jan 19, 2023

Congrats on the launch! Glad to see that you also support SigNoz as a backend.

allisdust · on Jan 20, 2023

Is it possible to use this for non kubernetes setups (for example, a single docker container or a single server).

edenfed · on Jan 20, 2023

Not yet, but i can setup some custom docker compose yaml for you depending on the programming language you are using

allisdust · on Jan 20, 2023

I'm using Rust (with warp framework if it helps). I can help test if thee is a docker compose :)

talhof8 · on Jan 20, 2023

Good luck, looks amazing!

תמיד כיף לראות שמות ישראליים פה, בהצלחה!