Hacker News new | past | comments | ask | show | jobs | submit login

This has gotten a bit better. Last time I checked, MKL now uses Zen-specific kernels for sgemm/dgemm. Unfortunately, these kernels are slower than the AVX2 kernels. But at least, it does not use the pre-modern SIMD kernels for AMD Zen anymore.

Edit, comparison:

    $ perf record target/release/gemm-benchmark  -d 1024
    Threads: 1
    Iterations per thread: 1000
    Matrix shape: 1024 x 1024
    GFLOPS/s: 96.36
    $ perf report --stdio -q | head -n3
        97.18%  gemm-benchmark  gemm-benchmark      [.] mkl_blas_def_sgemm_kernel_0_zen
         1.94%  gemm-benchmark  gemm-benchmark      [.] mkl_blas_def_sgemm_scopy_down16_bdz
         0.78%  gemm-benchmark  gemm-benchmark      [.] mkl_blas_def_sgemm_scopy_right4_bdz
After disabling Intel CPU detection:

    $ perf record target/release/gemm-benchmark  -d 1024
    Threads: 1
    Iterations per thread: 1000
    Matrix shape: 1024 x 1024
    GFLOPS/s: 129.12
    $ perf report --stdio -q | head -n3
        97.02%  gemm-benchmark  libmkl_avx2.so.1        [.] mkl_blas_avx2_sgemm_kernel_0
         1.77%  gemm-benchmark  libmkl_avx2.so.1        [.] mkl_blas_avx2_sgemm_scopy_down24_ea
         1.02%  gemm-benchmark  libmkl_avx2.so.1        [.] mkl_blas_avx2_sgemm_scopy_right4_ea
Benchmarked using https://github.com/danieldk/gemm-benchmark and oneMKL 2021.3.0.



How could one do your trick on Windows?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: