Because JVM does not have access to the hardware features needed to execute this efficiently. Also, the power of lapack comes from existing highly tuned implementations, not the interface itself. Reference BLAS and LAPACK are actually quite slow.
In fact, you can access the hardware through native calls via JNI (or JNA). Or course, then, you have to embedded multiplatform libraries and manage the associated issues.
Also, the OpenBLAS implementation is very well optimized for several Intel and AMD processors (you can compile it so that it autodetects which one you're using). It can even reach the efficiency of the Intel's MKL implementation in mono-threaded mode.
We don't even have to guess, since that's exactly what Neanderthal does. Also, I micro-benchmarked lots of options and have yet to find one that fills similar use case that is faster than Neanderhtal+MKL on the CPU, regardless of the JNI overhead (minus the obvious direct use of MKL, but that is much more low-level code). Also, most higher level libraries have considerable overhead. Neanderthal's overhead is tiny.
OpenBLAS's huge drawback is that it only supports BLAS without LAPACK, sparse, tensors, FFT etc.
Anyway, regarding the OP's comment, I guess that they meant to suggest implementing all that in pure Java, not Java + FFI, since then the native code has to be written in non-Java.
I guess when it comes to testing, Intel has more resources to do that than most other folks. That's one of the reasons why I use MKL despite it not being open source (but it is free as in free beer).