Nvidia calls them cores to deliberately confuse people, and make it appear vastl...

menaerus · 2025-03-02T09:36:40 1740908200

Thanks for the clarification on the NVidia, I didn't know that. What I also found is that NVidia groups 32 SIMD lanes into what they call a warp. Then 4 warps are grouped into what they're calling a streaming multiprocessor (SM). And lastly H100 has 114 SMs so 432114=14592 checks out.

> Zen 5 has 16x4(64) 32 bit SIMD lanes per core

Right, 2x FMA and 2x FADD so the highest-end Zen 5 die with 192 cores would total to 12288 32-bit SIMD lanes or half of that if we are considering only FMA ops. This is then indeed much closer to 14592 32-bit SIMD lanes of H100.

dzaima · 2025-03-02T02:50:53 1740883853

There are x86 extensions for fp16/bf16 ops. e.g. both Zen 4 and Zen 5 support AVX512_BF16, which has vdpbf16ps, i.e. dot product of pairs of bf16 elements from two args; that is, takes a total of 64 bf16 elts and outputs 16 fp32 elts. Zen 5 can run two such instrs per cycle.