Hacker News new | past | comments | ask | show | jobs | submit login

Quite old, plain Python is still slow but PyPy is in the league of Node.js.

I rewrote the same program (compare two images, generate a third that shows the diff) in a number of languages. Considering CPython as the reference implementation (1x), I got 100x in Rust, 60x in Go, 12x in Node.js and 10-11x in PyPy.

Initially I got 4x with PyPy but I did a light refactor, removing some map()s and zip()s that were gratitious (3-element lists) and then PyPy went real fast.




You've picked a poor example, I think. And you're probably coding in an inefficient way.

Here's the OpenCV way for a pair of 2048x2048 images:

    import cv2
    import time

    t = time.clock()

    a = cv2.imread("./image_1.tiff", cv2.IMREAD_GRAYSCALE);
    b = cv2.imread("./image_0.tiff", cv2.IMREAD_GRAYSCALE);

    c = a-b

    cv2.imwrite("out.tiff", c);

    print time.clock()-t
  
Takes about 0.2 CPU seconds on average for me (note using time.clock, not time.time on UNIX).

    #include <opencv2/opencv.hpp>
    #include <ctime>

    using namespace cv;
    using namespace std;

    int main(void){

      clock_t begin = clock();

      Mat a = imread("./image_1.tiff", IMREAD_GRAYSCALE);
      Mat b = imread("./image_0.tiff", IMREAD_GRAYSCALE);

      Mat c = a - b;

      imwrite("out.tiff", c);
  
      clock_t end = clock();
      double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
      cout << elapsed_secs << endl;
    }
Again, about 0.2 seconds. The difference is negligible if you use the right libraries. Python should not be your bottleneck for high performance code.


> The difference is negligible if you use the right libraries.

OpenCV is written in C++, it's going to be fairly efficient to call out to it in any language.

For me, most of the time spent in our code is in 'business logic' which necessarily must be in the main language of the codebase.

That's where PyPy gets its wins.


My conclusion is: as long as you don't do anything new, Python is fast enough.


That's true, but most of the time you can express the "new" stuff in terms of existing fast libraries.

I've had a few cases where you can't, and have become a big fan of Cython for that. It lets you add C typedefs to Python code, and then compile the module at import time. Example here: http://pastebin.com/sF8KmyiU

All of pure Python is still allowed in these modules, but the typedeffed variables become pure C variables instead of objects, and loops become pure C loops. For this particular function, I got a 1000x speedup compared to the original Python code.

In the end, this isn't Python any more - but it's close enough, and only needed for loops that run over millions of items.


Now, try implement that matrix subtraction or something more complex such as blurring or edge detection in Python, and compare results.

Also, are you sure that doesn't measure disk speed?


I get around 0.07 seconds without the write in C++ and the same in Python (good call though).

I agree that in pure Python it'd be slower, but realistically why would you do that? Unless you work somewhere where you're forced to write your own libraries... but even then you could still implement your own.


If you only removed the write and not the reads from the timing, I would guess the reads (even with warm caches) still dominate the time.

And you would want to do it in pure Python if you want to answer the question "how fast can we make interpreted Python?". Using C extensions for that is cheating, as it isn't Python and it isn't interpreted. You don't answer the question "how fast can you run?" With "30 km an hour, using a bicycle", either.

If you make those images large enough (and I guess 2k x 2k is large enough), any language that uses OpenCV to do the job will give results in the same ballpark. For example, you can make the difference between Python implementations that can call OpenCV as small as you want it.


Not the parent, but I think the point is that, in the real world, the vast majority of use cases for which Python is slow are ones where you would use an existing library written in a lower-level language. So questions like "How fast can we make matrix multiplication in Python?" are irrelevant for the vast majority of Python developers because NumPy exists, and it's always going to be faster than anything you can write in pure Python.


In Ipython %timeit gives 4.3ms per loop, just on a-b. In C++ it's about the same.

I agree that the question is valid - making vanilla Python faster is cool. My point was that this particular example (image processing) was flawed, because it's not something a sane person would ever do in pure Python.


"In Ipython %timeit gives 4.3ms per loop, just on a-b. In C++ it's about the same."

Of course it is about the same. Except for function entry and function exit, which should be a few thousand instructions, at the most, it runs the exact same instruction sequence (if you are using identical versions, compiler and compiler flags)

If you want an easily measurable difference, use way smaller images, and make a few thousand or even a few million calls, or look at the python sources to see how efficiently it calls into C.


Ok. But I would do it in Numba.


Be aware that python opencv uses c++ calls, it's only a wrapper. SciPy or numpy might be better examples.


I was under the impression that Numpy also just calls BLAS underneath? Hence why doing element-wise calls in Numpy is far, far faster than simply doing nested for loops.

But I think this is the great strength of Python. It's a glue language. If you need speed, you can always write a wrapper around a C/C++ library.


Funny you should say C/C++, because BLAS is Fortran.


Actually, it's mostly assembly (depending on the particular implementation). Lapack is Fortran though.


> SciPy or numpy might be better examples.

Python comprises less than 50% of the code in both of those repositories, but they are certainly great for learning the CPython API.


I've never used Rust, how would it compare to a C++ solution?


The standard response from the Rust team is that Rust should match or beat non-SIMD C++ performance and if it doesn't, you should file a bug.

Note: The first thing anybody will ask when you complain about Rust being slow is whether you compiled with optimizations turned on (`cargo build --release`) since it tends to make a 10-15x difference.


Theoretically the Rust borrow-checker also knows enough about your code's protection and dispatch semantics such that additional information could be used to create deeper optimizations than are available in either C or C++. Numerical analysis in Rust could compete with Fortran in performance, but I don't know if any of that has been actualized in Rust yet.


I think some of that might start to come along with increase in compiler plugins, which is feature gated to nightly build right now.


By what measurement?

I think the only thing I could say would be that it would be safer? And possible terser and possibly easy to understand.


As a C++ developer I find it hilarious that Node.js is a high performance league! ( your benchmark for example shows Rust as 6x faster)


But it's high performance given the semantics of the language. The work that has gone into making V8 perform as it does is extraordinary and should be respected, not mocked as 'hilarious'.


I believe the high performance usually refers to NodeJS' non-blocking I/O.


The benchmarks show that by using another language his code went 8x faster. Perhaps if he optimized his C++ it would go even faster. It's funny that people are saying 10x off the theoretical is high performance. I wonder if these people are living in a Javascript bubble.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: