Many CPUs with branch prediction carry a penalty of a least one cycle for mispredicted branches as the fetch stage(s) of the pipeline must be invalidated. From wikipedia:
The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles. The longer the pipeline the higher the need for a good branch predictor.
EDIT: the inclusion of this is somewhat interesting as there's not much of a point in simulating a pipelined processor unless you care about hardware details. My best guess is they're adding this "feature" to make compilation to assembly MORE difficult and increase the advantages of hand-compiled assembly. Branch prediction is a tricky thing to do right in compilers.
Ah, obviously my experience is somewhat out of date! My "hacking cycles off loops in assembler" days were all before I got my hands on any kit advanced enough for pipelining and branch prediction to be a consideration.
The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles. The longer the pipeline the higher the need for a good branch predictor.
http://en.wikipedia.org/wiki/Branch_predictor
EDIT: the inclusion of this is somewhat interesting as there's not much of a point in simulating a pipelined processor unless you care about hardware details. My best guess is they're adding this "feature" to make compilation to assembly MORE difficult and increase the advantages of hand-compiled assembly. Branch prediction is a tricky thing to do right in compilers.