Quite old, plain Python is still slow but PyPy is in the league of Node.js. I re...

joshvm · on April 18, 2016

You've picked a poor example, I think. And you're probably coding in an inefficient way.

Here's the OpenCV way for a pair of 2048x2048 images:

    import cv2
    import time

    t = time.clock()

    a = cv2.imread("./image_1.tiff", cv2.IMREAD_GRAYSCALE);
    b = cv2.imread("./image_0.tiff", cv2.IMREAD_GRAYSCALE);

    c = a-b

    cv2.imwrite("out.tiff", c);

    print time.clock()-t

Takes about 0.2 CPU seconds on average for me (note using time.clock, not time.time on UNIX).

    #include <opencv2/opencv.hpp>
    #include <ctime>

    using namespace cv;
    using namespace std;

    int main(void){

      clock_t begin = clock();

      Mat a = imread("./image_1.tiff", IMREAD_GRAYSCALE);
      Mat b = imread("./image_0.tiff", IMREAD_GRAYSCALE);

      Mat c = a - b;

      imwrite("out.tiff", c);
  
      clock_t end = clock();
      double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
      cout << elapsed_secs << endl;
    }

Again, about 0.2 seconds. The difference is negligible if you use the right libraries. Python should not be your bottleneck for high performance code.

true_religion · on April 18, 2016

> The difference is negligible if you use the right libraries.

OpenCV is written in C++, it's going to be fairly efficient to call out to it in any language.

For me, most of the time spent in our code is in 'business logic' which necessarily must be in the main language of the codebase.

That's where PyPy gets its wins.

coryrc · on April 18, 2016

My conclusion is: as long as you don't do anything new, Python is fast enough.

ProblemFactory · on April 19, 2016

That's true, but most of the time you can express the "new" stuff in terms of existing fast libraries.

I've had a few cases where you can't, and have become a big fan of Cython for that. It lets you add C typedefs to Python code, and then compile the module at import time. Example here: http://pastebin.com/sF8KmyiU

All of pure Python is still allowed in these modules, but the typedeffed variables become pure C variables instead of objects, and loops become pure C loops. For this particular function, I got a 1000x speedup compared to the original Python code.

In the end, this isn't Python any more - but it's close enough, and only needed for loops that run over millions of items.

Someone · on April 18, 2016

Now, try implement that matrix subtraction or something more complex such as blurring or edge detection in Python, and compare results.

Also, are you sure that doesn't measure disk speed?

joshvm · on April 18, 2016

I get around 0.07 seconds without the write in C++ and the same in Python (good call though).

I agree that in pure Python it'd be slower, but realistically why would you do that? Unless you work somewhere where you're forced to write your own libraries... but even then you could still implement your own.

Someone · on April 18, 2016

If you only removed the write and not the reads from the timing, I would guess the reads (even with warm caches) still dominate the time.

And you would want to do it in pure Python if you want to answer the question "how fast can we make interpreted Python?". Using C extensions for that is cheating, as it isn't Python and it isn't interpreted. You don't answer the question "how fast can you run?" With "30 km an hour, using a bicycle", either.

If you make those images large enough (and I guess 2k x 2k is large enough), any language that uses OpenCV to do the job will give results in the same ballpark. For example, you can make the difference between Python implementations that can call OpenCV as small as you want it.

RussianCow · on April 18, 2016

Not the parent, but I think the point is that, in the real world, the vast majority of use cases for which Python is slow are ones where you would use an existing library written in a lower-level language. So questions like "How fast can we make matrix multiplication in Python?" are irrelevant for the vast majority of Python developers because NumPy exists, and it's always going to be faster than anything you can write in pure Python.

joshvm · on April 19, 2016

In Ipython %timeit gives 4.3ms per loop, just on a-b. In C++ it's about the same.

I agree that the question is valid - making vanilla Python faster is cool. My point was that this particular example (image processing) was flawed, because it's not something a sane person would ever do in pure Python.

Someone · on April 19, 2016

"In Ipython %timeit gives 4.3ms per loop, just on a-b. In C++ it's about the same."

Of course it is about the same. Except for function entry and function exit, which should be a few thousand instructions, at the most, it runs the exact same instruction sequence (if you are using identical versions, compiler and compiler flags)

If you want an easily measurable difference, use way smaller images, and make a few thousand or even a few million calls, or look at the python sources to see how efficiently it calls into C.

tanlermin · on April 18, 2016

Ok. But I would do it in Numba.

joelg236 · on April 18, 2016

Be aware that python opencv uses c++ calls, it's only a wrapper. SciPy or numpy might be better examples.

joshvm · on April 18, 2016

I was under the impression that Numpy also just calls BLAS underneath? Hence why doing element-wise calls in Numpy is far, far faster than simply doing nested for loops.

But I think this is the great strength of Python. It's a glue language. If you need speed, you can always write a wrapper around a C/C++ library.

dietrichepp · on April 19, 2016

Funny you should say C/C++, because BLAS is Fortran.

howeman · on April 19, 2016

Actually, it's mostly assembly (depending on the particular implementation). Lapack is Fortran though.

ihnorton · on April 18, 2016

> SciPy or numpy might be better examples.

Python comprises less than 50% of the code in both of those repositories, but they are certainly great for learning the CPython API.

sorenjan · on April 18, 2016

I've never used Rust, how would it compare to a C++ solution?

grayrest · on April 18, 2016

The standard response from the Rust team is that Rust should match or beat non-SIMD C++ performance and if it doesn't, you should file a bug.

Note: The first thing anybody will ask when you complain about Rust being slow is whether you compiled with optimizations turned on (`cargo build --release`) since it tends to make a 10-15x difference.

im_down_w_otp · on April 18, 2016

Theoretically the Rust borrow-checker also knows enough about your code's protection and dispatch semantics such that additional information could be used to create deeper optimizations than are available in either C or C++. Numerical analysis in Rust could compete with Fortran in performance, but I don't know if any of that has been actualized in Rust yet.

viperscape · on April 18, 2016

I think some of that might start to come along with increase in compiler plugins, which is feature gated to nightly build right now.

hardwaresofton · on April 18, 2016

By what measurement?

I think the only thing I could say would be that it would be safer? And possible terser and possibly easy to understand.

frozenport · on April 18, 2016

As a C++ developer I find it hilarious that Node.js is a high performance league! ( your benchmark for example shows Rust as 6x faster)

chrisseaton · on April 18, 2016

But it's high performance given the semantics of the language. The work that has gone into making V8 perform as it does is extraordinary and should be respected, not mocked as 'hilarious'.

yeukhon · on April 18, 2016

I believe the high performance usually refers to NodeJS' non-blocking I/O.

frozenport · on April 21, 2016

The benchmarks show that by using another language his code went 8x faster. Perhaps if he optimized his C++ it would go even faster. It's funny that people are saying 10x off the theoretical is high performance. I wonder if these people are living in a Javascript bubble.