I am not sure you are familiar with modern GPU architecture. Both AMD's and NVidia GPU have no problems with branches. They do not do prediction and prefetch because it's pretty pointless on a single issue architecture. I believe the ISA docs are available to general public - you could easily familiarize yourself with them. I am also quite familiar with latency and bandwidth so the concept of negating one with another sounds very amateurish to me. If you could do that then everyone switched to high bandwidth memory and negated all the latency :) Speed is still speed and bandwidth is still bandwidth.