Could somebody familiar with the tech explain the Midgard execution model described on page 6 of the article. I don't understand how each arithmetic pipeline of the Midgard is "essentially it's own cpu". If this is the case, it's perfect no as I get an vast array of independent SIMD units. What is the downside?
I could imagine that independent SIMD units require more additional resources like instruction decoders and branch units, whereas in the warp/wavefront model, many ALUs can share those units, saving die space and energy. If the executed workloads are mostly coherent, that resources are wasted.
It's interesting that ARM switched to independent ALUs/SIMDs, because if I understand it correctly, that is exactly what PowerVR was doing with its SGX architecture before switching to a wavefront-like execution model. In a way, they are going in the opposite direction.