Hacker News new | past | comments | ask | show | jobs | submit login

CPU, yes, but more importantly memory bandwidth.

An RTX 3090 (as one example) has nearly 1TB/s of memory bandwidth. You'd need at least 12 channels of the fastest proof-of-concept DDR5 on the planet to equal that.

If you have a discrete GPU, use an implementation that utilizes it because it's a completely different story.

Apple Silicon boasts impressive numbers on LLM inference because it has a unified CPU-GPU high-bandwidth (400GB/s IIRC) memory architecture.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: