Hacker News new | past | comments | ask | show | jobs | submit login

> At a bare minimum, AVX grossly accelerates memcpy and memset operations.

Not necessarily, for large operations and depending on processor generation a simple rep stos/movsb will be simpler (no alignment requirements) and saturate your memory bandwidth just as well as any AVX sequence will with less icache pressure.




I wasn't aware of ERMSB "Enhanced Rep MOVSB". Thanks for the tip.

Seems to be a feature in Ivy Bridge and later, which happens to be around the time AVX2 started.


To be fair, it's issuing a long sequence of microcode ops under the hood using 256bit ops.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: