The "world model" of an LLM is just the set of [deep] predictive patterns that i...

Jerrrry · 2025-01-02T01:00:09 1735779609

  >Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.

>similarity

yes, except the computer can easily 'see' in more than 3 dimensions with more capability to spot similarities, and can follow lines of prediction (similar to chess) far more than any group of humans can.

that super-human ability to spot similarities and walk latent spaces 'randomly' -yet uncannily - has given rise to emergent phenomena that has mimicked proto-intelligence.

we have no idea what the ideas these tokens have embedded at different layers, and what capabilities can emerge now or at deployment time later, or given a certain prompt.

HarHarVeryFunny · 2025-01-02T12:39:23 1735821563

The inner workings/representations of transformers/LLMs aren't a total black box - there's a lot of work being done (and published) on "mechanistic interpretability", especially by Anthropic.

The intelligence we see in LLMs is to be expected - we're looking in the mirror. They are trained to copy humans, so it's just our own thought patterns and reasoning being output. The LLM is just a "selective mirror" deciding what to output for any given input.

Jerrrry · 2025-01-02T16:20:35 1735834835

Its mirroring the capability (if not currently the executive agency) of being able to convince people to do things. That alone gaps the barrier as social engineering is impossible to patch - harder than full proofing models against being jailbroken/used in an adversarial context.