CPU numbers are off, as FMA is considered 2 instructions, and Zen5 can do 2 of them per cycle in addition to two adds, so it would be 6 instructions per cycle not 4(GPU numbers are always quoted this way, so it is only fair to do the same for the CPU).
Also the 9950x has 32 threads, but is hyperthreaded, so it only has 16 actual cores, so the correct scaling factor is 16 cores * 16 SIMD lanes. Anyway the final number is 8.678 32 bit float TFLOPS.
The RTX 4090 has 82.58 32 bit TFLOPS according to Nvidia, but it also costs far more than the 9950x($1,600 vs $650), so I find this comparison rather odd.
So it costs 2.46 as much and delivers 9.5x the perf.
If you normalize for cost the perf advantage is about 3.8x, which is roughly the same numbers Intel reported years ago when they debunked the whole GPU is 100x better nonsense.
Anyway, I really hate the Cuda terminology where they refer to SIMD lanes as "threads".
There are also alot of the things to consider, where either the CPU or GPU has an advantage such as..
GPU advantages:
Hardware sin/cos support(with Nivida at least)
abs/saturate are often just modifiers
scaling by small powers of 2 is often free
16bit floats are fully supported
CPU advantages:
doubles are full speed and you can interleave with floats if you just need for a few calculations
access to wide variety of integer sizes and bit
manipulation functions, GPU has some of this but not nearly as broad
Decent points regarding relative strengths and weaknesses, but:
> lower level programming model
Do you mean how SASS (and the AMD equivalent) is not properly documented and is tool-less, as opposed to the assembly languages of different CPU architectures? Because otherwise, remember that one can write PTX code, and that is pretty low-level.
Also the 9950x has 32 threads, but is hyperthreaded, so it only has 16 actual cores, so the correct scaling factor is 16 cores * 16 SIMD lanes. Anyway the final number is 8.678 32 bit float TFLOPS.
The RTX 4090 has 82.58 32 bit TFLOPS according to Nvidia, but it also costs far more than the 9950x($1,600 vs $650), so I find this comparison rather odd.
So it costs 2.46 as much and delivers 9.5x the perf.
If you normalize for cost the perf advantage is about 3.8x, which is roughly the same numbers Intel reported years ago when they debunked the whole GPU is 100x better nonsense.
Anyway, I really hate the Cuda terminology where they refer to SIMD lanes as "threads".
There are also alot of the things to consider, where either the CPU or GPU has an advantage such as..
GPU advantages:
Hardware sin/cos support(with Nivida at least)
abs/saturate are often just modifiers
scaling by small powers of 2 is often free
16bit floats are fully supported
CPU advantages:
doubles are full speed and you can interleave with floats if you just need for a few calculations
access to wide variety of integer sizes and bit manipulation functions, GPU has some of this but not nearly as broad
lower level programing model