I'm not super familiar with ARM / ARM64 assembly and was confused as to how x0 was incremented.
Was going to ask here, but decided to not be lazy and just look it up.
const float f = *data++;
ldr s1, [x0], #4
Turns out this instruction loads and increments x0 by 4 at the same time.
It looks like you can use negative values too, so could iterate over something in reverse.
Kind of cool, I don't think x86_64 has a single instruction that can load and increment in one go.
Oh cool, and it looks like it can also decrement by 1/2/4/8.
> After the byte, word, or doubleword is transferred from the memory ___location into the AL, AX, or EAX register, the (E)SI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the ESI register is decremented.) The (E)SI register is incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword operations.
Kind of cool, I don't think x86_64 has a single instruction that can load and increment in one go.