Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Show HN: Python at the Speed of Rust (fxn.ai)
68 points by olokobayusuf 24 days ago | hide | past | favorite | 40 comments
I’m sure many of you are familiar, but there’s a treacherous gap between finding (or building) a model that works in PyTorch, and getting that deployed into your application, especially in consumer-facing applications.

I’ve been very interested in solving this problem with a great developer experience. Over time, I gradually realized that the highest-impact thing to have was a way to go from existing Python code to a self-contained native binary—in other words, a Python compiler.

I was already pretty familiar with a successful attempt: when Apple introduced armv8 on the iPhone 5s, they quickly mandated 64-bit support for all apps. Unity—where I had been programming since I was 11—kinda got f*cked because they used Mono to run developers’ C# code, and Mono didn’t support 64-bit ARM. Unity ended up building IL2CPP, which transpiles the C# intermediate language into C++, then cross-compiles it. Till date, this is perhaps the most amazing technical feat Unity has achieved imo.

I set out to build something similar, but this time starting from Python. It’s a pretty difficult problem, given the dynamic nature of the language. The key unlock was the PyTorch 2.0 release, where they pioneered the use of symbolic tracing to power `torch.compile`. In a nutshell, they register a callback with the Python interpreter (using CPython’s frame evaluation API), run a function with fake inputs, and record an IR graph of everything that happened in the function.

Once you have an IR graph, you can lower it to C++/Rust code, operation-by-operation, by propagating type information throughout the program (see the blog post for an example). And now is the perfect time to have this infrastructure, because LLMs can do all the hard work of writing and validating the required operations in native code.

Anyway, I wanted to share the proof-of-concept and gather feedback. Using Function is pretty simple, just decorate a module-level function with `@compile` then use the CLI to compile it: `fxn compile module.py` .

TL;DR: Get Rust performance without having to learn Rust ;)




> But what if we could compile Python into raw native code?

Am I missing something or has that been possible for at least 20 years?


I'm with you. I thought this was going to be an article about Numba.


I think a more pedantic way to describe what I mean is:

"What if we could compile Python into raw native code *without having a Python interpreter*?"

The key distinguishing feature of this compiler is being able to make standalone, cross-platform native binaries from Python code. Numba will fallback to using the Python interpreter for code that it can't jit.


This is akin to re-implementing the complete language.

Writing programming languages before AI was bit of a daunting task ; now it's way easier to grasp a first good principles and dive trough ; would still take time to get something production ready ; but that's definitely something that could happen


Spot on!

The majority of the innovation here is in building enough rails (specifically around lowering Python's language features to native code) so that LLM codegen can help you transform any Python code into equivalent native code (C++ and Rust in our case).


I'm confused about the introduction mentioning API keys? The function is compiled on and/or uploaded to your servers?

The introduction could benefit from discussing a less toyish problem.


Yes, we upload user code to a cloud sandbox in order to run our symbolic tracing and code generation algorithm.

Beyond that, we also compile the generated native code in the cloud so that devs don't have to have a cross-compiler installed on their system (i.e. Clang, Xcode, MSVC, and so on).


I think that the compilation flags utilized are needed in order to properly justify the validity of the microbenchmark here.

At opt-level 2 and 3 rust completely optimizes away the interior loop, leaving just two instructions (plus a check for if n=0). given that your graph isnt showing O(1) time complexity shows you either

1) didn't compile your rust code in release mode for benchmarking, as the release profile uses opt-level=3 or 2) intentionally downgraded the opt-level of the release profile in order to run these tests

I'd be interested to see a more proper example, one where the task itself has some non-zero time complexity. Show a quicksort implementation perhaps. But as it stands this is just marketing smoke and mirrors.


Way ahead of you: https://github.com/olokobayusuf/python-vs-rust/blob/main/Car...

I've clarified that this is not designed to be a rigorous benchmark. We've got rigorous benchmarks coming for image processing and CNN inference. I'll reply with the image processing example benchmark this week.


> Get Rust performance without having to learn Rust

I do that by using the Python C-API, no rust, no fuss...

Thinking about it, I haven't tried to see how well the robots do with wrapping C libraries -- though I usually use pybindgen to generate the initial py-module then fiddle with the code manually.

Other than that lisp implementation has anyone used the python api which lets you register a module as the 'compiler' for external file types? I forget what it's actually name is.


Yup but you're skilled enough to write--and more importantly, maintain--the required C/C++ code. Most devs and companies we talk to just want to make something people want; they don't care for the added complexity of writing and maintaining native code.

The way I like to think about this is how much more code got written when "high level" languages like C came onto the scene, at a time when Assembly was the default. Writing Python is way (way way) easier and faster than any of the lower-level languages--no pointers, no borrow checker!


Others have been here before. You'll have to deal with:

  * C-API
  * Have a language spec
  * Deal with differences in pattern matching 
  * Translating library calls
  * Compete with LLMs doing a better job converting for some use cases


Not entirely sure what you mean by having to deal with C-API and having a language spec.

We're also not competing with LLMs at all--we use LLMs for said conversion (under strict verification requirements).


Much of the stdlib is written in C.

What happens when you use @compile on a function that calls into stdlib directly or indirectly? How do you deal with existing extensions, GIL etc?

If you accept that 100% of legal python is not accepted, you have to write down exactly what is accepted.

Example: https://github.com/py2many/py2many/blob/main/doc/langspec.md


When we trace Python code, devs have to explicitly opt-in dependency modules to tracing. Specifically, the `@compile` decorator has a `trace_modules` parameter which is a `list[types.ModuleType]`.

With this in place, when we trace through a dev's function, a given function call is considered a leaf node unless the function's containing module is in `trace_modules`. This covers the Python stdlib.

We then take all leaf nodes, lookup their equivalent implementation in native code (written and validated by us), and use that native implementation for codegen.

We don't interact with the GIL. And we keep track of what is unsupported so far: https://docs.fxn.ai/predictors/requirements#language-coverag...


So will this only work with standard Python?

If I start pulling in Numpy, igraph, scikit, etc, will it still work?

The article mentions Numpy as a gang up when trying to use Python across multiple systems, but it doesn’t address whether Function deals with this.

Looks cool, regardless!


Actually we're currently implementing Numpy (and PyTorch) support, and will cover a few other core scientific computing libraries like scipy. See docs: https://docs.fxn.ai/predictors/requirements#library-coverage


Awesome!


How does this compare to a project like Mojo, which also seems to aim at filling that "treacherous gap" you note? They seem to believe that they can't just write a Python compiler, why are you more confident?


I think us and Modular have incredibly similar visions of the end state of the world. The main difference is that they require devs to learn a new programming language (Mojo), whereas Function is designed to meet devs exactly where they are--not an inch away.

More fundamental than that is that Mojo has somewhat of a built-in assumption (I'm a bit more skeptical) that they can outperform silicon OEMs like Nvidia using MLIR (Mojo is designed as a front-end to MLIR). We have a much more conservative view: we'll rely on silicon OEMs to give us libraries we can use to accelerate things like inference; and we'll provide devs the ability to inspect and modify the generated code (C++ atm, Rust soon).

TL;DR: Devs don't have to learn anything with Function. Devs have to learn Mojo and MLIR to use Modular's offerings properly.


Also worth noting that we're better positioned for the coming wave of LLMs writing the vast majority of code in production. Function is designed specifically for LLMs to do the transpilation step (Function simply provides rails to stack these unitary operations into user-defined functions).

Mojo has a cold-start problem here, cos there isn't enough Mojo code in the wild for LLMs to be great at writing it in volume (compare this to C++ and Rust).


Makes sense, that's definitely a downside with Mojo. But my understanding is they made this tradeoff because compiling the python code directly would not work out. So is this something that is for the function-level (hence the name) and you don't plan to do this for full programs?


Spot on! We're starting from specific functions for now, and will expand the scope if/when feasible. The challenges exist across two axes: language coverage and library coverage.

For language coverage, Function currently doesn't support classes (on the roadmap) or lambda expressions (much harder). But these are the main limitations.

For library coverage, this is where things get incredibly hairy. If a library isn't pure Python (i.e. most of the important libs), we can't compile them. We instead have to reimplement the library's functionality. For now, we plan on using LLMs to automate this as much as possible.


>If a library isn't pure Python (i.e. most of the important libs), we can't compile them.

If the library is written in C, like Numpy, can't you just link against it?


Not quite.

First, Function is designed to be truly cross-platform but libraries like Numpy aren't compiled for say WebAssembly.

Second, the native libraries are usually built around CPython interop (i.e. the C API expects to interact with the CPython interpreter). Function does not (and will never) have a CPython interpreter (we generate full AOT compiled code).


Super cool! Have been a fan of this project for a while. I think this is super important for anyone who is trying to do ml inference on edge hardware, but also the browser too.


Thank you <3


Not sure about the LLM part. I still prefer my transformation to be done by LLVM, in many cases rigorously proven to be correct instead of hopefully not hallucinating.


We use LLVM!

We're not translating Python directly to LLVM IR (I think I've seen other projects do this). We translate Python to C++/Rust first, where we have rigorous unit tests for every operation we support translating. We then use LLVM for downstream compilation to object code.

Here's some more context: https://docs.fxn.ai/predictors/compiler


how come performance is worse than Rust? Isn’t this misleading


It's marginal. The difference exists because Function adds syntactic sugar to make the developer experience of calling the compiled functions easy and smooth. The Function call isn't a direct call; whereas the Rust one is.

It's possible to hack Function to perform a direct call and avoid (most of) the overhead, but in 99.99% of use cases this won't matter cos most processing time will be spent in the body of the function, not the scaffolding that Function uses.


Yeah, Python at the Speed of Rust if you ignore the unavoidable overhead of still using Python.


There's actually no overhead of "still using Python" because we don't use Python. The overhead in Function exists solely because we have a bunch of sugar added to create a unified interface for calling different kinds of functions (i.e. generator functions, etc).

You don't have to take my word for it: you can pull the C++ source code that we generate from a given Python function and inspect it yourself: https://docs.fxn.ai/predictors/compiler#compiling-binaries


Have you seen Mojo?



More serious reply: us and Mojo have similar visions of where the world is going. The key difference between us is that Mojo is itself a new programming language. Sure, it supports Python, but it doesn't actually compile Python. It simply delegates all interactions with Python code to the CPython interpreter: https://docs.modular.com/mojo/why-mojo/#compatibility-with-p...

But beyond that, our goal is meeting devs where they are. This means that beyond just compiling Python code, we provide SDKs for different frameworks (JavaScript, Kotlin, Swift, React Native, Unity, etc) that devs can use to run these functions within their applications, in as little as two lines of code.

We're very (very) focused on developers shipping products that use Function. We're already embedded in web apps, apps on the App Store, Play Store, and other places.


What's the end goal of this? Would I be able to compile any Python program? Will I be able to compile Django or FastApi apps?


We're focused specifically on on-device AI inference (and related compute-bound algorithms, like computer vision or scientific computing).

We want devs to find or develop inference code (always in Python); decorate it with Function's `@compile`; compile it; and run natively in their Android, iOS, macOS, Linux, WebAssembly, or Windows applications.

We won't be supporting other use cases like web servers (no Django or FastAPI).


Why are you doing numerical code in single precision? If no one in your group did a numerical analysis course at college it’s time you moved on to think about some other problem more suited to your experience set.


No need to be snarky. The choice to default to FP32 is inspired by the fact that most typical use cases don't need double-precision (GPU shader languages and game engines do this all the time). This in turn allows us to vectorize code for 2x throughput compared to using FP64. We're gonna add a flag to change the default floating-point precision for devs who need extra precision.

See docs: https://docs.fxn.ai/predictors/requirements#floating-point-v...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: