Hacker News new | past | comments | ask | show | jobs | submit login

What's frustrating is there's no real reason why regular DDR5 can't reach 1TB/sec with a sufficient number of channels. The manufacturers are just holding it back to drip feed that memory bandwidth over several generations. Except for Apple, which lets you have 800GB/sec now, and will let you have 1TB/sec+ in M5 Ultra next year. It's $$$$, but still - the true alternatives are much less cost effective.



Right, like the mac studio ultra.

Or a dual socket AMD Turin.

Or a grace+hopper (assuming you offload to the hopper).

Latency is dictated by the laws of physics, more bandwidth is easy, but not cheap.


It can be relatively cheap too under the constraints imposed by typical AI workloads, at least when it comes to getting to a 1TB/s or so. All you need is high-spec DDR5 and _a ton_ of memory channels in your SOC. During transformer inference you will easily be able to use those parallel, multichannel reads. I get why you'd need HBM and several TB/s of memory bandwidth for extremely memory intensive training workloads. But for inference 1TB/s gives you a lot to work with (especially if your model is a MoE), and it doesn't have to be ultra expensive.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: