The chip architecture described in the article reminds me of the DEC MasPar system [1] we had at uni back in the mid-90s. 2048 processors (IIRC), where each processor could only communicate directly with it's 8 neighbours. If you wanted to get decent performance out of it, you had to think carefully about you were going to get your data onto each of the processors.
[1] http://en.wikipedia.org/wiki/MasPar