Hacker News new | past | comments | ask | show | jobs | submit login

Nvidia calls them cores to deliberately confuse people, and make it appear vastly more powerful than it really is. What they are in reality is SIMD lanes.

So the H100(which costs vastly more than a Zen5..), has 14592 32 bit SIMD lanes, not cores.

A Zen 5 has 16x4(64) 32 bit SIMD lanes per core, so scale that by core count to get your answer. A higher end desktop Zen5 will have 16 cores, so 64x16 = 1024. The Zen5 also clocks much higher than the GPU, so you can also scale it up by perhaps 1.5-2x

While this is obviously less than the H100, the Zen5 chip costs $550 and the H100 cost $40k.

There is more to it than this, GPUs also have transcendental functions, texture sampling, and 16 bit ops(which are lacking in CPUs). While CPUs are much more flexible, and have powerful byte & integer manipulation instructions, along with full speed 64 bit integer/double support.




Thanks for the clarification on the NVidia, I didn't know that. What I also found is that NVidia groups 32 SIMD lanes into what they call a warp. Then 4 warps are grouped into what they're calling a streaming multiprocessor (SM). And lastly H100 has 114 SMs so 432114=14592 checks out.

> Zen 5 has 16x4(64) 32 bit SIMD lanes per core

Right, 2x FMA and 2x FADD so the highest-end Zen 5 die with 192 cores would total to 12288 32-bit SIMD lanes or half of that if we are considering only FMA ops. This is then indeed much closer to 14592 32-bit SIMD lanes of H100.


There are x86 extensions for fp16/bf16 ops. e.g. both Zen 4 and Zen 5 support AVX512_BF16, which has vdpbf16ps, i.e. dot product of pairs of bf16 elements from two args; that is, takes a total of 64 bf16 elts and outputs 16 fp32 elts. Zen 5 can run two such instrs per cycle.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: