Because JVM does not have access to the hardware features needed to execute this...

TurpIF · on Sept 19, 2017

In fact, you can access the hardware through native calls via JNI (or JNA). Or course, then, you have to embedded multiplatform libraries and manage the associated issues. Also, the OpenBLAS implementation is very well optimized for several Intel and AMD processors (you can compile it so that it autodetects which one you're using). It can even reach the efficiency of the Intel's MKL implementation in mono-threaded mode.

dragandj · on Sept 19, 2017

We don't even have to guess, since that's exactly what Neanderthal does. Also, I micro-benchmarked lots of options and have yet to find one that fills similar use case that is faster than Neanderhtal+MKL on the CPU, regardless of the JNI overhead (minus the obvious direct use of MKL, but that is much more low-level code). Also, most higher level libraries have considerable overhead. Neanderthal's overhead is tiny.

OpenBLAS's huge drawback is that it only supports BLAS without LAPACK, sparse, tensors, FFT etc.

Anyway, regarding the OP's comment, I guess that they meant to suggest implementing all that in pure Java, not Java + FFI, since then the native code has to be written in non-Java.

reikonomusha · on Sept 19, 2017

What about correctness? Numerical linear algebra is very difficult and bugs are hard to test for and weed out.

dragandj · on Sept 19, 2017

I guess when it comes to testing, Intel has more resources to do that than most other folks. That's one of the reasons why I use MKL despite it not being open source (but it is free as in free beer).