A significant chunk of DRAM latency directly relates to the speed of electricity so there is only so much room for improvement without drastically changing how motherboards are arranged. However, when you compare how long it takes to update the CPU cache vs the first bytes coming back there is a little more room for improvement.
@800ghz you get do do a round trip of around 12cm per cycle so your stuck with 2+ cycles out of a 5 cycle delay.
From reading the article, it seems that the claim is that most of the power is used in communicating with the rest of the system - and that is handled by only a part of the cube structure. The rest of the cube uses a lot less power, so would generate far less heat than normal DDRx chips.