You need a ton of specialized knowledge to use compute effectively.
If we had infinite memory and infinite compute we'd just throw every problem of length n to a tensor of size R^(n^n).
The issue is that we don't have enough memory in the world to store that tensor for something as trivial as mnist (and won't until the 2100s). And as you can imagine the exponentiated exponential grows a bit faster than the exponential so we never will.
Then how does this invalidate the bitter lesson? It's like you're saying if aerodynamics were true, we'd have planes flying like insects by now. But that's simply not how it works at large scales - in particular if you want to build something economical.
Because is the bitter lesson were true no one would be wasting their time with convolutions or attention blocks. You'd just replace them with the general tensor that allows every hyper relation possible between all points instead.
You need a ton of specialized knowledge to use compute effectively.
If we had infinite memory and infinite compute we'd just throw every problem of length n to a tensor of size R^(n^n).
The issue is that we don't have enough memory in the world to store that tensor for something as trivial as mnist (and won't until the 2100s). And as you can imagine the exponentiated exponential grows a bit faster than the exponential so we never will.