Hacker News new | past | comments | ask | show | jobs | submit login

> This is a significant problem on AMD; Intel and Apple seems to be better.

When did this change? In my testing years ago (while I was writing Rosetta 2, so Icelake-era Intel), Intel only allowed a load to forward from a single store, and no partial forwarding (i.e. mixed cache/register) without a huge penalty, whereas AMD at least allowed partial forwarding (or had a considerably lower penalty than Intel).






I don't know if AMD allows more or fewer _situations_, but empirically, I'm seeing a lot of total cycles lost to this on Zen 2 and 3, and much less on the Intel CPUs I've been testing (mostly Skylake derivatives and Alder Lake).

I haven't tested Zen 4 or 5, but I haven't heard anything that indicates they should be a lot better.


Interesting! IIRC, the LLVM passes dedicated to dodging this issue were contributed by Intel engineers, so maybe there’s some bias.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: