Show HN: Python at the Speed of Rust

ratatoskrt · 2025-04-12T13:55:55 1744466155

> But what if we could compile Python into raw native code?

Am I missing something or has that been possible for at least 20 years?

grandma_tea · 2025-04-12T14:42:32 1744468952

I'm with you. I thought this was going to be an article about Numba.

olokobayusuf · 2025-04-13T16:33:56 1744562036

I think a more pedantic way to describe what I mean is:

"What if we could compile Python into raw native code *without having a Python interpreter*?"

The key distinguishing feature of this compiler is being able to make standalone, cross-platform native binaries from Python code. Numba will fallback to using the Python interpreter for code that it can't jit.

6r17 · 2025-04-12T14:02:06 1744466526

This is akin to re-implementing the complete language.

Writing programming languages before AI was bit of a daunting task ; now it's way easier to grasp a first good principles and dive trough ; would still take time to get something production ready ; but that's definitely something that could happen

olokobayusuf · 2025-04-13T16:35:44 1744562144

Spot on!

The majority of the innovation here is in building enough rails (specifically around lowering Python's language features to native code) so that LLM codegen can help you transform any Python code into equivalent native code (C++ and Rust in our case).

croemer · 2025-04-12T12:48:24 1744462104

I'm confused about the introduction mentioning API keys? The function is compiled on and/or uploaded to your servers?

The introduction could benefit from discussing a less toyish problem.

olokobayusuf · 2025-04-13T16:30:55 1744561855

Yes, we upload user code to a cloud sandbox in order to run our symbolic tracing and code generation algorithm.

Beyond that, we also compile the generated native code in the cloud so that devs don't have to have a cross-compiler installed on their system (i.e. Clang, Xcode, MSVC, and so on).

cbondurant · 2025-04-12T14:39:36 1744468776

I think that the compilation flags utilized are needed in order to properly justify the validity of the microbenchmark here.

At opt-level 2 and 3 rust completely optimizes away the interior loop, leaving just two instructions (plus a check for if n=0). given that your graph isnt showing O(1) time complexity shows you either

1) didn't compile your rust code in release mode for benchmarking, as the release profile uses opt-level=3 or 2) intentionally downgraded the opt-level of the release profile in order to run these tests

I'd be interested to see a more proper example, one where the task itself has some non-zero time complexity. Show a quicksort implementation perhaps. But as it stands this is just marketing smoke and mirrors.

olokobayusuf · 2025-04-13T12:52:17 1744548737

Way ahead of you: https://github.com/olokobayusuf/python-vs-rust/blob/main/Car...

I've clarified that this is not designed to be a rigorous benchmark. We've got rigorous benchmarks coming for image processing and CNN inference. I'll reply with the image processing example benchmark this week.

UncleEntity · 2025-04-12T19:35:12 1744486512

> Get Rust performance without having to learn Rust

I do that by using the Python C-API, no rust, no fuss...

Thinking about it, I haven't tried to see how well the robots do with wrapping C libraries -- though I usually use pybindgen to generate the initial py-module then fiddle with the code manually.

Other than that lisp implementation has anyone used the python api which lets you register a module as the 'compiler' for external file types? I forget what it's actually name is.

olokobayusuf · 2025-04-13T12:32:20 1744547540

Yup but you're skilled enough to write--and more importantly, maintain--the required C/C++ code. Most devs and companies we talk to just want to make something people want; they don't care for the added complexity of writing and maintaining native code.

The way I like to think about this is how much more code got written when "high level" languages like C came onto the scene, at a time when Assembly was the default. Writing Python is way (way way) easier and faster than any of the lower-level languages--no pointers, no borrow checker!

adsharma · 2025-04-12T15:47:26 1744472846

Others have been here before. You'll have to deal with:

  * C-API
  * Have a language spec
  * Deal with differences in pattern matching 
  * Translating library calls
  * Compete with LLMs doing a better job converting for some use cases

olokobayusuf · 2025-04-13T16:45:57 1744562757

Not entirely sure what you mean by having to deal with C-API and having a language spec.

We're also not competing with LLMs at all--we use LLMs for said conversion (under strict verification requirements).

adsharma · 2025-04-13T17:00:10 1744563610

Much of the stdlib is written in C.

What happens when you use @compile on a function that calls into stdlib directly or indirectly? How do you deal with existing extensions, GIL etc?

If you accept that 100% of legal python is not accepted, you have to write down exactly what is accepted.

Example: https://github.com/py2many/py2many/blob/main/doc/langspec.md

olokobayusuf · 2025-04-13T23:15:23 1744586123

When we trace Python code, devs have to explicitly opt-in dependency modules to tracing. Specifically, the `@compile` decorator has a `trace_modules` parameter which is a `list[types.ModuleType]`.

With this in place, when we trace through a dev's function, a given function call is considered a leaf node unless the function's containing module is in `trace_modules`. This covers the Python stdlib.

We then take all leaf nodes, lookup their equivalent implementation in native code (written and validated by us), and use that native implementation for codegen.

We don't interact with the GIL. And we keep track of what is unsupported so far: https://docs.fxn.ai/predictors/requirements#language-coverag...

j_bum · 2025-04-12T13:20:02 1744464002

So will this only work with standard Python?

If I start pulling in Numpy, igraph, scikit, etc, will it still work?

The article mentions Numpy as a gang up when trying to use Python across multiple systems, but it doesn’t address whether Function deals with this.

Looks cool, regardless!

olokobayusuf · 2025-04-13T16:43:44 1744562624

Actually we're currently implementing Numpy (and PyTorch) support, and will cover a few other core scientific computing libraries like scipy. See docs: https://docs.fxn.ai/predictors/requirements#library-coverage

j_bum · 2025-04-13T16:45:07 1744562707

Awesome!

ModernMech · 2025-04-09T14:21:02 1744208462

How does this compare to a project like Mojo, which also seems to aim at filling that "treacherous gap" you note? They seem to believe that they can't just write a Python compiler, why are you more confident?

olokobayusuf · 2025-04-09T14:25:06 1744208706

I think us and Modular have incredibly similar visions of the end state of the world. The main difference is that they require devs to learn a new programming language (Mojo), whereas Function is designed to meet devs exactly where they are--not an inch away.

More fundamental than that is that Mojo has somewhat of a built-in assumption (I'm a bit more skeptical) that they can outperform silicon OEMs like Nvidia using MLIR (Mojo is designed as a front-end to MLIR). We have a much more conservative view: we'll rely on silicon OEMs to give us libraries we can use to accelerate things like inference; and we'll provide devs the ability to inspect and modify the generated code (C++ atm, Rust soon).

TL;DR: Devs don't have to learn anything with Function. Devs have to learn Mojo and MLIR to use Modular's offerings properly.

olokobayusuf · 2025-04-09T14:26:57 1744208817

Also worth noting that we're better positioned for the coming wave of LLMs writing the vast majority of code in production. Function is designed specifically for LLMs to do the transpilation step (Function simply provides rails to stack these unitary operations into user-defined functions).

Mojo has a cold-start problem here, cos there isn't enough Mojo code in the wild for LLMs to be great at writing it in volume (compare this to C++ and Rust).

ModernMech · 2025-04-09T15:10:07 1744211407

Makes sense, that's definitely a downside with Mojo. But my understanding is they made this tradeoff because compiling the python code directly would not work out. So is this something that is for the function-level (hence the name) and you don't plan to do this for full programs?

olokobayusuf · 2025-04-09T15:16:55 1744211815

Spot on! We're starting from specific functions for now, and will expand the scope if/when feasible. The challenges exist across two axes: language coverage and library coverage.

For language coverage, Function currently doesn't support classes (on the roadmap) or lambda expressions (much harder). But these are the main limitations.

For library coverage, this is where things get incredibly hairy. If a library isn't pure Python (i.e. most of the important libs), we can't compile them. We instead have to reimplement the library's functionality. For now, we plan on using LLMs to automate this as much as possible.

DeathArrow · 2025-04-12T13:48:52 1744465732

>If a library isn't pure Python (i.e. most of the important libs), we can't compile them.

If the library is written in C, like Numpy, can't you just link against it?

olokobayusuf · 2025-04-13T16:41:37 1744562497

Not quite.

First, Function is designed to be truly cross-platform but libraries like Numpy aren't compiled for say WebAssembly.

Second, the native libraries are usually built around CPython interop (i.e. the C API expects to interact with the CPython interpreter). Function does not (and will never) have a CPython interpreter (we generate full AOT compiled code).

shannonclaude · 2025-04-09T15:55:30 1744214130

Super cool! Have been a fan of this project for a while. I think this is super important for anyone who is trying to do ml inference on edge hardware, but also the browser too.

olokobayusuf · 2025-04-09T17:18:50 1744219130

Thank you <3

KeplerBoy · 2025-04-12T14:59:46 1744469986

Not sure about the LLM part. I still prefer my transformation to be done by LLVM, in many cases rigorously proven to be correct instead of hopefully not hallucinating.

olokobayusuf · 2025-04-13T12:43:39 1744548219

We use LLVM!

We're not translating Python directly to LLVM IR (I think I've seen other projects do this). We translate Python to C++/Rust first, where we have rigorous unit tests for every operation we support translating. We then use LLVM for downstream compilation to object code.

Here's some more context: https://docs.fxn.ai/predictors/compiler

nsethi89 · 2025-04-09T14:59:57 1744210797

how come performance is worse than Rust? Isn’t this misleading

olokobayusuf · 2025-04-09T15:06:09 1744211169

It's marginal. The difference exists because Function adds syntactic sugar to make the developer experience of calling the compiled functions easy and smooth. The Function call isn't a direct call; whereas the Rust one is.

It's possible to hack Function to perform a direct call and avoid (most of) the overhead, but in 99.99% of use cases this won't matter cos most processing time will be spent in the body of the function, not the scaffolding that Function uses.

IshKebab · 2025-04-12T14:56:19 1744469779

Yeah, Python at the Speed of Rust if you ignore the unavoidable overhead of still using Python.

olokobayusuf · 2025-04-13T12:41:12 1744548072

There's actually no overhead of "still using Python" because we don't use Python. The overhead in Function exists solely because we have a bunch of sugar added to create a unified interface for calling different kinds of functions (i.e. generator functions, etc).

You don't have to take my word for it: you can pull the C++ source code that we generate from a given Python function and inspect it yourself: https://docs.fxn.ai/predictors/compiler#compiling-binaries

thatxliner · 2025-04-12T16:01:21 1744473681

Have you seen Mojo?

olokobayusuf · 2025-04-13T12:36:50 1744547810

https://x.com/OlokobaYusuf/status/1908538983874810303

olokobayusuf · 2025-04-13T12:49:22 1744548562

More serious reply: us and Mojo have similar visions of where the world is going. The key difference between us is that Mojo is itself a new programming language. Sure, it supports Python, but it doesn't actually compile Python. It simply delegates all interactions with Python code to the CPython interpreter: https://docs.modular.com/mojo/why-mojo/#compatibility-with-p...

But beyond that, our goal is meeting devs where they are. This means that beyond just compiling Python code, we provide SDKs for different frameworks (JavaScript, Kotlin, Swift, React Native, Unity, etc) that devs can use to run these functions within their applications, in as little as two lines of code.

We're very (very) focused on developers shipping products that use Function. We're already embedded in web apps, apps on the App Store, Play Store, and other places.

DeathArrow · 2025-04-12T13:47:07 1744465627

What's the end goal of this? Would I be able to compile any Python program? Will I be able to compile Django or FastApi apps?

olokobayusuf · 2025-04-13T12:36:07 1744547767

We're focused specifically on on-device AI inference (and related compute-bound algorithms, like computer vision or scientific computing).

We want devs to find or develop inference code (always in Python); decorate it with Function's `@compile`; compile it; and run natively in their Android, iOS, macOS, Linux, WebAssembly, or Windows applications.

We won't be supporting other use cases like web servers (no Django or FastAPI).

nzzn · 2025-04-12T22:37:55 1744497475

Why are you doing numerical code in single precision? If no one in your group did a numerical analysis course at college it’s time you moved on to think about some other problem more suited to your experience set.

olokobayusuf · 2025-04-13T12:29:07 1744547347

No need to be snarky. The choice to default to FP32 is inspired by the fact that most typical use cases don't need double-precision (GPU shader languages and game engines do this all the time). This in turn allows us to vectorize code for 2x throughput compared to using FP64. We're gonna add a flag to change the default floating-point precision for devs who need extra precision.

See docs: https://docs.fxn.ai/predictors/requirements#floating-point-v...