In the beginning (8086/8088) because it was a shorter instruction encoding than ...

dfox · on Sept 18, 2019

The reason why it is special cased is that on pipelined or OoO CPU, xor ax,ax would otherwise be significantly slower than straight mov as xor has dependency on its operand registers.

On a similar note on many RISC architectures NOP is actually something like ADD r0, r0, r0 and that too is usually special cased in the hazard stall and result forwarding logic (althought usually the special cased part is “ignore hazards that involve r0”)

aflag · on Sept 18, 2019

Adding the same register to itself and then assigning the value to itself is not really a nop. Why is that instruction forbidden and converted to a nop instead?

pwg · on Sept 18, 2019

Because in some RISC architectures, the first (R0) or last (R31 or whatever is largest register number) is also a 'special' register in that it is hardwired to zero, any reads return zero, and writes are simply discarded.

In this instance, if the register always reads as zero, and can never be changed, then ADD R0,R0,R0 is, in effect, a NOP, so it gets special cased, and doing so avoids having to allocate an additional opcode explicitly to the "NOP" instruction.

Anticipating the question of "why is Rx (0 or 31 or ??) hardwired to zero?", that one is because it is useful for:

1) obtaining a zero without having to perform a load from memory

2) creating additional addressing modes by reusing another existing addressing mode

For #2, if the RISC arch. implements, say, base plus index addressing where two registers are added together to obtain a final address, using R0 as one of the inputs creates a direct addressing mode from the base plus index mode.

So base plus index could be written as load R5, R6+R7. Substituting R0 (assuming it is the hardwired to zero register) for R6 (or R7) results in directly addressing from the value in the other register, converting an 'indexed' addressing mode into a 'direct' mode, without having to add a 'mode select bit' to the actual instruction. The result being that the chip only needed hardware for a single addressing mode, but the programmer has two addressing modes available for their use. If memory serves, the DEC Alpha made large use of tricks like this. The hardware only implemented a small handful of addressing modes (say 4-5) yet the full set of addressing modes exposed to the programmer was two or three times larger due to creative uses of "zero stuffing" into the actual hardware modes.

wscott · on Sept 18, 2019

1999 in Pentium Pro. As the other guy says the real special part is that writes zero to ax without reading ax. So it breaks a dependency chain if you do this in a loop.

The important tuning piece for OOO processors is removing dependencies between calculations.