Hacker News new | past | comments | ask | show | jobs | submit login

AFAIK you have to think about how many different 512b paths are being driven when this happens, like each cycle in the steady-state case is simultaneously (in the case where you can do two vfmadd132ps per cycle):

- Capturing 2x512b from the L1D cache

- Sending 2x512b to the vector register file

- Capturing 4x512b values from the vector register file

- Actually multiplying 4x512b values

- Sending 2x512b results to the vector register file

.. and probably more?? That's already like 14*512 wires [switching constantly at 5Ghz!!], and there are probably even more intermediate stages?




… per core. There are eight per compute tile!

I like to ask IT people a trick question: how many numbers can a modern CPU multiply in the time it takes light to cross a room?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: