Hacker News new | past | comments | ask | show | jobs | submit login

> The expected speedup is around 100x, or 1000x for numerical stuff or allocation-heavy work that can be done statically. Whenever you get 10,000x or above, it's because you've written a better algorithm.

Anecdotally I recently rewrote a piece of Python code in Rust and got ~300x speedup, but let's be conservative and give it 100x. Now let's extrapolate from that. In native code you can use SIMD, and that can give you a 10x speedup, so now we're at 1000x. In native code you can also easily use multiple threads, so assuming a machine with a reasonably high number of cores, let's say 32 of them (because that's what I had for the last 4 years), we're now at 32000x speedup. So to me those are very realistic numbers, but of course assuming the problem you're solving can be sped up with SIMD and multiple threads, which is not always the case. So you're probably mostly right.




Trivially parallelizable algorithms are definitely in the "not generally applicable" regime. But you're right, they're capable of hitting arbitrarily large, hardware-dependent speedups. And that's definitely something a sufficiently intelligent compiler should be able to capture through dependency analysis.

Note that I don't doubt the 35k speedup -- I've seen speedups into the millions -- I'm just saying there's no way that can be a representative speedup that users should expect to see.


Python can use multiprocessing with a shared nothing architecture to use those 32 threads.


I was about to say the same thing.

Multiprocessing on Python works great and isn’t even very hard if you use say async_apply with a Pool.

Comparing single-threaded Python with multiprocesssing in Language X is unfair if not disingenuous.


> Multiprocessing on Python works great and isn’t even very hard if you use say async_apply with a Pool.

Multiprocessing works great if you don't really need a shared memory space for your task. If it's very loosely coupled, that's fine.

But if you use something that can benefit from real threading, Python clamps you to about 1.5-2.5 cores worth of throughput very often.


There's a serialization overhead both on dispatch and return that makes multiprocessing in Python unsuitable for some problems that would otherwise be solved well with threads in other languages.


Unless you don't need to change your code.


The other languages are not taking/releasing a globally mutually exclusive GIL every time it crosses an API boundary and thus "shared nothing" in those languages is truly shared nothing. Additionally, Python's multiprocessing carries a lot of restrictions which makes it hard to pass more complex messages.


And each of these threads will still have the Python interpreter performance.

Nothing preventing something like Mojo to also use those same 32 threads but with 10-100x the performance instead.


Hear me out… we can write bad python code to justify impressive speed boosts rewriting it in rust.

In this way we can justify rewriting stuff in rust to our bosses!

If we write decent python, and perhaps even replace 1 line to use pypy, the speedup won't be impressive and we won't get to play with rust!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: