it comes down to the limits of available Information with a capital 'I'. If you're working within the encoding system (as you're recommending here with the "all the text in the world" approach), then in order to learn the function that's generating this information, the messages that you're examining have a minimum amount of information they can convey. There needs to be enough visible structure purely within the context of the messages themselves to make the underlying signal clear.
I don't think it's so weird to imagine that natural language really doesn't convey a ton of explicit information on its own. Sure, there's some there, enough that our current AI attempts can solve little corners of the bigger problem. But is it so strange to imagine that the machinery of the human brain takes lossy, low-information language and expands, extrapolates, and interprets it so heavily so as to make it orders of magnitude more complex than the lossy, narrow channel through which it was conveyed? That the only reason we're capable of learning language and understanding eachother (the times we _do_ understand eachother) is because we all come pre-equipped with the same decryption hardware?
I don't think it's so weird to imagine that natural language really doesn't convey a ton of explicit information on its own. Sure, there's some there, enough that our current AI attempts can solve little corners of the bigger problem. But is it so strange to imagine that the machinery of the human brain takes lossy, low-information language and expands, extrapolates, and interprets it so heavily so as to make it orders of magnitude more complex than the lossy, narrow channel through which it was conveyed? That the only reason we're capable of learning language and understanding eachother (the times we _do_ understand eachother) is because we all come pre-equipped with the same decryption hardware?