The "world model" of an LLM is just the set of [deep] predictive patterns that it was induced to learn during training. There is no magic here - the model is just trying to learn how to auto-regressively predict training set continuations.
Of course the humans who created the training set samples didn't create them auto-regressively - the training set samples are artifacts reflecting an external world, and knowledge about it, that the model is not privy to, but the model is limited to minimizing training errors on the task it was given - auto-regressive prediction. It has no choice. The "world model" (patterns) it has learnt isn't some magical grokking of the external world that it is not privy to - it is just the patterns needed to minimize errors when attempting to auto-regressively predict training set continuations.
Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.
>Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.
>similarity
yes, except the computer can easily 'see' in more than 3 dimensions with more capability to spot similarities, and can follow lines of prediction (similar to chess) far more than any group of humans can.
that super-human ability to spot similarities and walk latent spaces 'randomly' -yet uncannily - has given rise to emergent phenomena that has mimicked proto-intelligence.
we have no idea what the ideas these tokens have embedded at different layers, and what capabilities can emerge now or at deployment time later, or given a certain prompt.
The inner workings/representations of transformers/LLMs aren't a total black box - there's a lot of work being done (and published) on "mechanistic interpretability", especially by Anthropic.
The intelligence we see in LLMs is to be expected - we're looking in the mirror. They are trained to copy humans, so it's just our own thought patterns and reasoning being output. The LLM is just a "selective mirror" deciding what to output for any given input.
Its mirroring the capability (if not currently the executive agency) of being able to convince people to do things. That alone gaps the barrier as social engineering is impossible to patch - harder than full proofing models against being jailbroken/used in an adversarial context.
Of course the humans who created the training set samples didn't create them auto-regressively - the training set samples are artifacts reflecting an external world, and knowledge about it, that the model is not privy to, but the model is limited to minimizing training errors on the task it was given - auto-regressive prediction. It has no choice. The "world model" (patterns) it has learnt isn't some magical grokking of the external world that it is not privy to - it is just the patterns needed to minimize errors when attempting to auto-regressively predict training set continuations.
Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.