> The expected speedup is around 100x, or 1000x for numerical stuff or allocatio...

klyrs · on May 11, 2023

Trivially parallelizable algorithms are definitely in the "not generally applicable" regime. But you're right, they're capable of hitting arbitrarily large, hardware-dependent speedups. And that's definitely something a sufficiently intelligent compiler should be able to capture through dependency analysis.

Note that I don't doubt the 35k speedup -- I've seen speedups into the millions -- I'm just saying there's no way that can be a representative speedup that users should expect to see.

piyh · on May 11, 2023

Python can use multiprocessing with a shared nothing architecture to use those 32 threads.

malodyets · on May 11, 2023

I was about to say the same thing.

Multiprocessing on Python works great and isn’t even very hard if you use say async_apply with a Pool.

Comparing single-threaded Python with multiprocesssing in Language X is unfair if not disingenuous.

mlyle · on May 11, 2023

> Multiprocessing on Python works great and isn’t even very hard if you use say async_apply with a Pool.

Multiprocessing works great if you don't really need a shared memory space for your task. If it's very loosely coupled, that's fine.

But if you use something that can benefit from real threading, Python clamps you to about 1.5-2.5 cores worth of throughput very often.

heavyset_go · on May 11, 2023

There's a serialization overhead both on dispatch and return that makes multiprocessing in Python unsuitable for some problems that would otherwise be solved well with threads in other languages.

mirekrusin · on May 12, 2023

Unless you don't need to change your code.

vlovich123 · on May 11, 2023

The other languages are not taking/releasing a globally mutually exclusive GIL every time it crosses an API boundary and thus "shared nothing" in those languages is truly shared nothing. Additionally, Python's multiprocessing carries a lot of restrictions which makes it hard to pass more complex messages.

coldtea · on May 11, 2023

And each of these threads will still have the Python interpreter performance.

Nothing preventing something like Mojo to also use those same 32 threads but with 10-100x the performance instead.

bombolo · on May 11, 2023

Hear me out… we can write bad python code to justify impressive speed boosts rewriting it in rust.

In this way we can justify rewriting stuff in rust to our bosses!

If we write decent python, and perhaps even replace 1 line to use pypy, the speedup won't be impressive and we won't get to play with rust!