I appreciate the response with a lot of interesting details, however, I don't believe it answers the question I had? My doubt was why is it so that the CPU design suffers from clock frequency issues in AVX-512 workloads whereas GPUs which have much more compute power do not.
I assumed that it was due to the fact that GPUs run at much lower clock frequencies and therefore available power budget but as I also discussed with another commenter above this was probably a premature conclusion since we don't have enough evidence showing that GPUs indeed do not suffer from same type of issues. They likely do but nobody measured it yet?
The low clock frequency when executing AVX-512 workloads is a frequency where the CPU operates efficiently, with a low energy consumption per operation executed.
For such a workload that executes a very large number of operations per second, the CPU cannot afford to operate inefficiently because it will overheat.
When a CPU core has many execution units that are idle, so they do not consume power, like when executing only scalar operations or only operations with narrow 128-bit vectors, it can afford to raise the clock frequency e.g. by 50%, even if that would increase the energy consumption per operation e.g. 3 times. By executing 4 times or 8 times less operations per clock cycle, even if the energy consumption is 3 times higher the total power consumption is smaller and the CPU does not overheat and the desktop owner does not care that the completion of the same workload requires much more energy, because it is likely that the owner cares more about the time to completion.
The clock frequency of a GPU also varies continuously depending on the workload, in order to maintain the power consumption within the limits. However a GPU is not designed to be able to increase the clock frequency as much as a CPU. The fastest GPUs have clock frequencies under 3 GHz, while the fastest CPUs exceed 6 GHz.
The reason is that normally one never launches a GPU program that would use only a small fraction of the resources of a GPU allowing a higher clock frequency, so it makes no sense to design a GPU for this use case.
Designing a chip for a higher clock frequency greatly increases the size of the chip, as shown by the comparison between a normal Zen core designed for 5.7 GHz and a Zen compact core, designed e.g. for 3.3 GHz, a frequency not much higher than that of a GPU.
On Zen compact cores and on normal Zen cores configured for server CPUs with a large number of cores, e.g. 128 cores (with a total of 4096 FP32 ALUs, like a low-to-mid-range desktop GPU, or like a top desktop GPU of 5 years ago; a Zen compact server CPU can have 6144 FP32 ALUs, more than a RTX 4070), the clock frequency variation range is small, very similar to the clock variation range of a GPU.
In conclusion, it is not the desktop/laptop CPUs which drop their clock frequency, but it is the GPUs which never raise their clock frequency much, the same as the server CPUs, because neither GPUs nor server CPUs are normally running programs that keep most of their execution units idle, to allow higher clock frequencies without overheating.
I assumed that it was due to the fact that GPUs run at much lower clock frequencies and therefore available power budget but as I also discussed with another commenter above this was probably a premature conclusion since we don't have enough evidence showing that GPUs indeed do not suffer from same type of issues. They likely do but nobody measured it yet?