This article title should have "(2004)" added; this is seriously old information.
For modern use, something about ARM CPUs would be much more useful since that's what microcontrollers all use now. No one's doing ASM programming on x86 CPUs these days (and certainly not Pentium4 CPUs).
Perhaps it's rare with full programs written in assembly, but for performance analysis and optimization I think knowledge about these kinds of tricks (but probably updated for the N generations since 2004, of course) still have relevance.
For instance Daniel Lemire's blog [1] is quite often featured here, and very often features very low-level performance analysis and improvements.
Try Eigen then, where people were tweaking every last ounce of performance. Even then, it has problems matching MKL or nVidia libs for ultimate performance sometimes.
> No one's doing ASM programming on x86 CPUs these days
I don't think that's entirely true...it's still pretty common to write high-performance / performance sensitive computation kernels in assembly or intrinsics.
For modern use, something about ARM CPUs would be much more useful since that's what microcontrollers all use now. No one's doing ASM programming on x86 CPUs these days (and certainly not Pentium4 CPUs).