>* IFE, IFN, IFG, IFB take 2 cycles, plus the cost of a and b, plus 1 if the test fails*
It is a long time since I worked in assembly, but I don't remember comparison functions having different timings depending on results when I were a lad.
(FYI, most of the assembler I played with was for 6502s (in the old beebs) with a little for the Z80 family and early x86)
Many CPUs with branch prediction carry a penalty of a least one cycle for mispredicted branches as the fetch stage(s) of the pipeline must be invalidated. From wikipedia:
The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles. The longer the pipeline the higher the need for a good branch predictor.
EDIT: the inclusion of this is somewhat interesting as there's not much of a point in simulating a pipelined processor unless you care about hardware details. My best guess is they're adding this "feature" to make compilation to assembly MORE difficult and increase the advantages of hand-compiled assembly. Branch prediction is a tricky thing to do right in compilers.
Ah, obviously my experience is somewhat out of date! My "hacking cycles off loops in assembler" days were all before I got my hands on any kit advanced enough for pipelining and branch prediction to be a consideration.
It seems to me the extra cycle is used to read the first word of the next instruction, since it needs to know how long the next instruction is to skip it.
>* IFE, IFN, IFG, IFB take 2 cycles, plus the cost of a and b, plus 1 if the test fails*
It is a long time since I worked in assembly, but I don't remember comparison functions having different timings depending on results when I were a lad.
(FYI, most of the assembler I played with was for 6502s (in the old beebs) with a little for the Z80 family and early x86)