I feel like extensive pretraining goes against the spirit of generality. If you ...

Valectar · 2025-03-05T03:59:42 1741147182

This betrays a very naive concept of "knowledge" and "understanding". It presupposes that there's some kind of platonic realm of logic and reason that an AGI just needs to tap in to. But ultimately, there can be no meaning, or reasoning, or logic, without context. Matching a pattern of shapes presupposes the concept of a shape, which presupposes a concept of spatial relationships, which presupposes a concept of 3 or even 2 dimensional space. These things only seem obvious and implicit to you because they permeate the environment that your mind spent hundreds of millions of years evolving to interpret, and then tens of years consuming and processing to understand.

The true test of an AGI is it's ability to assimilate disparate information into a coherent world-view, which is effectively what the pretraining is doing. And even then, it is likely that any intelligence capable of doing that will need to be "preloaded" with assumptions about the world it will occupy, structurally. Similar to the regions of the brain which are adept at understanding spatial relationships, or language, or interpreting our senses, etc.

gitfan86 · 2025-03-05T13:00:13 1741179613

Yes, AGI was here at AlphaGo. People don't like that because they think it should have generalized outside of GO, but when you say AGI was here at AlphaZero which can play other games they again say not general enough. At this point is seem unlikely that AI will ever be general enough to satisfy the sceptics for the reason you said. There will always be some ___domain that requires training on new data.

devmor · 2025-03-05T15:40:01 1741189201

You're calling an Apple an Orange and complaining that everyone else wont refer to it as such. AGI is a computer program that can understand or learn any task a human can, mimicking the cognitive ability of a human.

It doesn't have to actually "think" as long as it can present an indistinguishable facsimile, but if you have to rebuild its training set for each task, that does not qualify. We don't reset human brains from scratch to pick up new skills.

gitfan86 · 2025-03-12T14:42:41 1741790561

I'm calling a very small orange an orange and people are saying it isn't a real orange because it should be bigger so I show them a bigger orange and they say not big enough. And that continues forever.

someothherguyy · 2025-03-05T13:11:21 1741180281

https://en.wikipedia.org/wiki/General_game_playing

Is not

https://en.wikipedia.org/wiki/Artificial_general_intelligenc...

luke-stanley · 2025-03-05T17:45:17 1741196717

Maybe not yet, but what prevents games from getting more complicated and matching rich human environments, requiring rich human like adaptability? Nothing at all!

Jensson · 2025-03-06T00:43:53 1741221833

But AlphaZero can't play those richer games so it doesn't really matter in this context.

luke-stanley · 2025-03-07T20:55:37 1741380937

Famous last words!

FrustratedMonky · 2025-03-05T13:08:30 1741180110

"AI will ever be general enough to satisfy the sceptics for the reason you said"

Also

People keep thinking "General" means one AI can "do everything that any human can do everywhere all at once".

When really, humans are also pretty specialized. Humans have Years of 'training' to do a 'single job'. And they do not easily switch tasks.

devmor · 2025-03-05T15:34:40 1741188880

>When really, humans are also pretty specialized. Humans have Years of 'training' to do a 'single job'. And they do not easily switch tasks.

What? Humans switch tasks constantly and incredibly easily. Most "jobs" involve doing so rapidly many times over the course of a few minutes. Our ability to accumulate knowledge of countless tasks and execute them while improving on them is a large part of our fitness as a species.

You probably did so 100+ times before you got to work. Are you misunderstanding the context of what a task is in ML/AI? An AI does not get the default set of skills humans take for granted, its starting as a blank slate.

FrustratedMonky · 2025-03-05T18:17:08 1741198628

You're looking at small tasks.

You don't have a human spend years getting an MBA, then drop them in a Physics Lab and expect them to perform.

But that is what we want from AI, to do 'all' jobs equally as great as any individual human in that one job.

devmor · 2025-03-05T19:48:44 1741204124

That is a result we want from AI, it is not the exhaustive definition of AGI.

There are steps of automation that could fulfill that requirement without ever being AGI - it’s theoretically possible (and far more likely) that we achieve that result without making a machine or program that emulates human cognition.

It just so happens that our most recent attempts are very good at mimicking human communication, and thus are anthropomorphized as being near human cognition.

FrustratedMonky · 2025-03-05T22:47:47 1741214867

I agree.

I'm just making point that for AI "General" Intelligence.

That humans are also not as "General" as we assume in these discussion. Humans are also limited in a lot of ways, and narrowly trained, make stuff up, etc...

So even a human isn't necessarily a good example for what AGI would mean. Human is not a good target either.

devmor · 2025-03-06T05:26:11 1741238771

Humans are our only model of the type of intelligence we are trying to develop, any other target would be a fantasy with no control to measure against.

Humans are extremely general. Every single type of thing we want an AGI to do is a type of things that a human is good at doing, and none of those humans were designed specifically to do that thing. It is difficult for humans to move from specialization to specialization, but we do learn them with only the structure to "learn, generally" being our scaffolding.

What I mean by this is that we do want AGI to be general in the way a human is. We just want it to be more scalable. It's capacity for learning does not need to be limited by material issues (i.e. physical brain matter constraints), time, or time scale.

So where a human might take 16 years to learn how to perform surgery well, and then need another 12 years to switch to electrical engineering, an AGI should be able to do it the same way, but with the timescale only limited by the amount of hardware we can throw at it.

If it has to be structured from the ground up for each task, it is not a general intelligence, it's not even comparable to humans, let alone scalable beyond us.

FrustratedMonky · 2025-03-07T12:12:58 1741349578

So find a single architecture that can be taught to be an electrical engineer or a doctor.

Where today those are being done, but specialized architectures, models, combination of methods.

Then that would be a 'general' intelligence, the one type of model that can do either. Trained to be an engineer or doctor. And like a human once trained, they might not do the other job well. But they did both start with same 'tech', like humans all have the same architecture in the 'brain'.

I don't think it will be an LLM, it will be some combo of methods in use today.

Ok. I'll buy that. I'm not sure everyone is using 'general' in that way. I think more-often people think a single AI instance that can do everything/everywhere/all at once. Be an engineer and doctor at same time. Since it can do all the tasks at same time, it is 'general'. Since we are making AI's that can do everything, could have a case statement inside to switch models, half joking. At some point all the different AI methods will be incorporated together and will appear even more human/general.

gitfan86 · 2025-03-07T12:49:33 1741351773

Right, but even at that point the sceptics will still stay that it isn't "truly general" or unable to do X in the same way a human does. Intelligence like beauty is in the eye of the beholder.

Jensson · 2025-03-06T00:41:38 1741221698

But if humans are so bad, what does that say about a model that can't even do what humans can?

Humans are a good target since we know human intelligence is possible, its much easier to target something that is possible rather than some imaginary intelligence.

gitfan86 · 2025-03-05T15:59:35 1741190375

No human ever got good at Tennis without learning the rules. Why would we not allow an AI to also learn the rules before expecting it to get good at tennis.

Jensson · 2025-03-05T17:18:38 1741195118

> Why would we not allow an AI to also learn the rules before expecting it to get good at tennis.

The model should learn the rule, don't make a model based on the rules. When you make a model based on the rules then it isn't a general model.

Human DNA isn't made to play tennis, but a human can still learn to play it. The same should be for a model, it should learn it, the model shouldn't be designed by humans to play tennis.

butlike · 2025-03-05T15:47:03 1741189623

So you're saying AI can be incompetent at a grander scale. Got it.

FrustratedMonky · 2025-03-05T18:15:08 1741198508

Yes. It can be as good and bad as a human. Humans also make up BS to answers.

jshmrsn · 2025-03-04T21:50:14 1741125014

If the machine can decide how to train itself (adjust weights) when faced with a type of problem it hasn’t seen before, then I don’t think that would go against the spirit of general intelligence. I think that’s basically what humans do when they decide to get better at something, they figure out how to practice that task until they get better at it.

pona-a · 2025-03-04T22:10:44 1741126244

In-context learning is a very different problem from regular prediction. It is quite simple to fit a stationary solution to noisy data, that's just a matter of tuning some parameters with fairly even gradients. In-context learning implies you're essentially learning a mesa-optimizer for the class of problems you're facing, which in the form of transformers means essentially means fitting something not that far from a differentiable Turing machine with no inductive biases.

fsndz · 2025-03-04T22:02:50 1741125770

Exactly. That's basically the problem with a lot of the current paradigm, they don't allow true generalisation. That's why some people say there won't be any AGI anytime soon: https://www.lycee.ai/blog/why-no-agi-openai

exe34 · 2025-03-04T22:25:12 1741127112

"true generalisation" isn't really something a lot of humans can do.

godelski · 2025-03-05T02:47:32 1741142852

  > humans CAN do.

I think people often get confused with claims

  - Humans CAN generalize
  - Humans CAN reason
  - Humans CAN be intelligent
  - Humans CAN be conscious

Generalization[0] is something MOST humans CAN do, but MOST humans DO NOT do. Do not confuse "can" and "are".

One of my pet peeves is how often qualifying words are just ignored. They are critical parts of any communication.[1]

Another pet peeve is over anthropomorphization. Anthropomorphism is a useful tool, but.. well... we CAN over generalize ;)

[0] I don't know what you mean by "true generalization". I'm not going to address that because you can always raise the bar for what is "true" and let's try to be more concrete. Maybe I misunderstand. I definitely misunderstand.

[1] Classic example: someone says "most x are y" then there's a rebuttal of "but x_1 isn't y" or "I'm x and I'm not y" or some variation. Great! Most isn't all? This is not engaging in good faith and there's examples found with any qualifying word. It is quite common to see.

fsndz · 2025-03-04T22:44:26 1741128266

the thing is LLMs don't even do the kind of generalisations the dumbest human can do. while simultaneously doing some stuff the smartest human probably can't

kaptainkarl · 2025-03-04T23:59:17 1741132757

Describes computers in general pretty well

taneq · 2025-03-05T08:12:46 1741162366

Moravec talked about this kind of thing in the '80s.

K0balt · 2025-03-05T09:53:14 1741168394

I’m interested to know more? Any specifics, or just look it up?

fsndz · 2025-03-05T13:03:05 1741179785

just check moravec paradox: https://en.wikipedia.org/wiki/Moravec's_paradox

K0balt · 2025-03-06T00:11:45 1741219905

Thank you.

leptons · 2025-03-05T08:50:31 1741164631

Sadly, a lot of humans choose not to think. We're in an age of willful ignorance.

fritzo · 2025-03-05T03:52:55 1741146775

i feel like you're overgeneralizing here

tripplyons · 2025-03-04T21:05:05 1741122305

I think that most human learning comes from years of sensory input. Why should we expect a machine to generalize well without any background?

aithrowawaycomm · 2025-03-04T21:38:34 1741124314

Newborns (and certainly toddlers) seem to understand the underlying concepts for these things when it comes to visual/hepatic object identification and "folk physics":

  A short list of abilities that cannot be performed by CompressARC includes:

  Assigning two colors to each other (see puzzle 0d3d703e)
  Repeating an operation in series many times (see puzzle 0a938d79)
  Counting/numbers (see puzzle ce9e57f2)
  Translation, rotation, reflections, rescaling, image duplication (see puzzles 0e206a2e, 5ad4f10b, and 2bcee788)
  Detecting topological properties such as connectivity (see puzzle 7b6016b9)

Note: I am not saying newborns can solve the corresponding ARC problems! The point is there is a lot of evidence that many of the concepts ARC-AGI is (allegedly) measuring are innate in humans, and maybe most animals; e.g. cockroaches can quickly identify connected/disconnected components when it comes to pathfinding. Again, not saying cockroaches can solve ARC :) OTOH even if orcas were smarter than humans they would struggle with ARC - it would be way too baffling and obtuse if your culture doesn't have the concept of written standardized tests. (I was solving state-mandated ARCish problems since elementary school.) This also applies to hunter-gatherers, and note the converse: if you plopped me down among the Khoisan in the Kalahari, they would think I was an ignorant moron. But it makes as much sense scientifically to say "human-level intelligence" entails "human-level hunter-gathering" instead of "human-level IQ problems."

Ukv · 2025-03-04T21:44:46 1741124686

> there is a lot of evidence that many of the concepts ARC-AGI is (allegedly) measuring are innate in humans

I'd argue that "innate" here still includes a brain structure/nervous system that evolved on 3.5 billion years worth of data. Extensive pre-training of one kind or another currently seems the best way to achieve generality.

SiempreViernes · 2025-03-05T02:24:40 1741141480

So all billons spent finding tricks and architectures that perfom well haven't resulted in any durable structures in contemporary LLMs?

Each new training from scratch is a perfect blank slate and the only thing ensuring words come out is the size of the corpus?

Ukv · 2025-03-05T10:22:36 1741170156

> Each new training from scratch is a perfect blank slate [...]?

I don't think training runs are done entirely from scratch.

Most training runs in practice will start from some pretrained weights or distill an existing model - taking some model pretrained on ImageNet or Common Crawl and fine-tuning it to a specific task.

But even when the weights are randomly initialized, the hyperparameters and architectural choices (skip connections, attention, ...) will have been copied from previous models/papers by what performed well empirically, sometimes also based on trying to transfer our own intuition (like stacking convolutional layers as a rough approximation of our visual system), and possibly refined/mutated through some grid search/neural architecture search on data.

onemoresoop · 2025-03-05T02:13:00 1741140780

Sure and LLMs ain’t nothing of this sort. While they’re an incredible feat in technology, they’re just a building block for intelligence, an important building block I’d say.

mystified5016 · 2025-03-05T01:36:43 1741138603

Newborn brains aren't blank, they are complex beyond our current ability to understand. All mammals are born with a shocking amount of instinctual knowledge built right into their genome.

All organisms are born pre-trained because if you can't hide or survive the moment you're born, you get eaten.

leptons · 2025-03-05T09:00:04 1741165204

> if you can't hide or survive the moment you're born, you get eaten.

uhhh... no, most newborns can't "hide or survive the moment they're born", no matter the species. I'm sure there are a few examples, but I seriously doubt it's the norm.

Many species survive by reproducing en masse, where it takes many (sometimes thousands of) eaten offspring for one to survive to adulthood.

K0balt · 2025-03-05T10:01:49 1741168909

In humans at least, they lack the motor control to attempt to hide , gather, or hunt obviously. But they do plenty of other stuff instinctively. With my latest, we learned that infants are inherently potty trained (defecation) if you pay attention to the cues…and I was surprised to find that that was true, baby communicates the need to go and knows what’s happening without any training at all. Amazed to have almost zero soiled diapers at one month.

Makes sense though, I’m pretty sure mammals don’t do well with the insects and diseases that come with waste saturated bed.

anon7000 · 2025-03-05T10:47:51 1741171671

The point clearly still stands: every species on the planet has a long list of attributes and behaviors directly attributable to evolution and “pretraining.” And many, many more based on education (the things a lioness teaches her cubs.)

I’m not sure we would call anyone intelligent today if they had no education. Intelligence relies on building blocks that are only learned, and the “masters” of certain fields are drawing on decades and decades of learnings about their direct experience.

So our best examples of intelligence include experience, training, knowledge, evolutionary factors, what have you — so we probably need to draw on that to create a general intelligence. How can we expect to have an intelligence in a certain field if it hasn’t spent a lot of time “ruminating on”/experiencing/learning about/practicing/evolving/whatever, on those types of problems?

mystified5016 · 2025-03-05T16:35:12 1741192512

Please do some research into developmental psychology. Babies are far, far more sophisticated than you seem to believe.

taneq · 2025-03-06T14:56:05 1741272965

Please have a baby and report back first-hand observations. Yes, they're far, far more sophisticated than most (all?) humans can comprehend, but they're also completely incapable for multiple months after birth. This isn't unexpected, human babies are born at what would still be mid-late gestation in almost any other mammal.

That quote about how "the only intuitive interface ever devised was the nipple"? Turns out there's still a fair bit of active training required all around to even get that going. There's no such thing as intuitive, only familiar.

leptons · 2025-03-05T18:01:59 1741197719

Have you ever seen a human baby attempt to hide upon birth? No? Thought so.

naasking · 2025-03-05T13:14:30 1741180470

> Newborns (and certainly toddlers) seem to understand the underlying concepts for these things when it comes to visual/hepatic object identification and "folk physics"

Yes, they enjoy millions of years of pretraining thanks to evolution, ie. their pretrained base model has some natural propensity for visual, auditory, and tactile sensory modalities, and some natural propensity for spatial and causal reasoning.

Krasnol · 2025-03-04T21:24:47 1741123487

I'd guess it's because we don't want to have another human. We want something better. Therefore, the expectations on the learning process are way beyond what humans do. I guess some are expecting some magic word (formula) which would be like a seed with unlimited potential.

So like humans after all but faster.

I guess it's just hard to write a book about the way you write that book.

andoando · 2025-03-04T21:33:15 1741123995

It does but it also generalizes extremely well

tripplyons · 2025-03-05T02:32:06 1741141926

I haven't seen a convincing argument that it is more sample efficient than a neural network that has seen an equivalent amount of lifetime information. Yann LeCun gave an interesting explanation of how even in the first few years of a child's life, they have seen much more information than the largest pretrained models have.

Davidzheng · 2025-03-05T09:30:51 1741167051

Arc is equivalent to a distribution over four tuples of images--with no prior the last image is uniformly distributed given the first three...

ta8645 · 2025-03-04T20:28:05 1741120085

The issue is that general intelligence is useless without vast knowledge. The pretraining is the knowledge, not the intelligence.

dchichkov · 2025-03-04T20:50:36 1741121436

For long context sizes AGI is not useless without vast knowledge. You could always put a bootstrap sequence into the context (think Arecibo Message), followed by your prompt. A general enough reasoner with enough compute should be able to establish the context and reason about your prompt.

ta8645 · 2025-03-04T21:00:12 1741122012

Yes, but that just effectively recreates the pretraining. You're going to have to explain everything down to what an atom is, and essentially all human knowledge if you want to have any ability to consider abstract solutions that call on lessons from foreign domains.

There's a reason people with comparable intelligence operate at varying degrees of effectiveness, and it has to do with how knowledgeable they are.

pona-a · 2025-03-04T21:39:48 1741124388

Would that make in-context learning a superset or a subset of pretraining?

This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles.

[0] https://arxiv.org/pdf/2212.07677

ta8645 · 2025-03-04T22:00:46 1741125646

> Would that make in-context learning a superset or a subset of pretraining?

I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.

conradev · 2025-03-04T20:57:22 1741121842

Isn't knowledge of language necessary to decode prompts?

dchichkov · 2025-03-05T01:40:03 1741138803

0 1 00 01 10 11 000 001 010 011 100 101 110 111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110

And no, I don't think the knowledge of language is necessary. To give a concrete example, tokens from TinyStories dataset (the dataset size is ~1GB) are known to be sufficient to bootstrap basic language.

tripplyons · 2025-03-04T21:06:49 1741122409

I'm not at all experienced in neuroscience, but I think that humans and other animals primarily gain intelligence by learning from their sensory input.

FergusArgyll · 2025-03-04T21:19:59 1741123199

You don't think a lot is encoded in genes from before we're born?

aaronblohowiak · 2025-03-04T21:38:55 1741124335

>a lot

this is pretty vague. I certainly dont think a mastery of any concept invented in last thousand years would be considered encoded in genes though we would want or expect an AGI to be able to learn calculus for instance. In terms of "encoded in genes", I'd say most of what is asked or expected of AGI goes beyond what feral children (https://en.wikipedia.org/wiki/Feral_child) were able to demonstrate.

tripplyons · 2025-03-05T02:29:29 1741141769

I don't disagree, but I think there is much more information encoded in the brain. I believe this phenomenon is called the genomic bottleneck.

There are a few orders of magnitude more neural connections in a human than there are base pairs in a human genome. I would also assume that there are more than 4 possible ways for neural connections to be formed, while there are only 4 possible base pairs. Also, most genetic information corresponds to lower level functions.

pona-a · 2025-03-04T20:50:18 1741121418

I don't think so. A lot of useful specialized problems are just patterns. Imagine your IDE could take 5 examples of matching strings and produce a regex you can count on working? It doesn't need to know the capital of Togo, metabolic pathways of the eukaryotic cell, or human psychology.

For that matter, if it had no pre-training, it means it can generalize to any new programming languages, libraries, and entire tasks. You can use it to analyze the grammar of a dying African language, write stories in the style of Hemingway, and diagnose cancer on patient data. In all of these, there are only so many samples to fit on.

ta8645 · 2025-03-04T21:16:16 1741122976

Of course, none of us have exhaustive knowledge. I don't know the capital of Togo.

But I do have enough knowledge to know what an IDE is, and where that sits in a technological stack, i know what a string is, and all that it relies on etc. There's a huge body of knowledge that is required to even begin approaching the problem. If you posted that challenge to an intelligent person from 2000 years ago, they would just stare at you blankly. It doesn't matter how intelligent they are, they have no context to understand anything about the task.

pona-a · 2025-03-04T21:34:16 1741124056

> If you posted that challenge to an intelligent person from 20,00 years ago, they would just stare at you blankly.

Depending on how you pose it. If I give you a long enough series of ordered cards, you'll on some basic level begin to understand the spatiotemporal dynamics of them. You'll get the intuition that there's a stack of heads scanning the input, moving forward each turn, either growing the mark, falling back, or aborting. If not constrained by using matrices, I can draw you a state diagram, which would have much clearer immediate metaphors than colored squares.

Do these explanations correspond to some priors in human cognition? I suppose. But I don't think you strictly need them for effective few-shot learning. My main point is that learning itself is a skill, which generalist LLMs do possess, but only as one of their competencies.

ta8645 · 2025-03-04T21:55:27 1741125327

Well Dr. Michael Levin would agree with you in the sense that he ascribes intelligence to any system that can accomplish a goal through multiple pathways. So for instance the single-celled Lacrymaria, lacking a brain or nervous system, can still navigate its environment to find food and fulfill its metabolic needs.

However, I assumed what we're talking about when we discuss AGI is what we'd expect a human to be able to accomplish in the world at our scale. The examples of learning without knowledge you've given, to my mind at least, are a lower level of intelligence that doesn't really approach human level AGI.

bloomingkales · 2025-03-04T21:37:46 1741124266

A lot of useful specialized problems are just patterns.

It doesn't need to know the capital of Togo, metabolic pathways of the eukaryotic cell, or human psychology.

What if knowing those things distills down to a pattern that matches a pattern of your code and vice versa? There's a pattern in everything, so know everything, and be ready to pattern match.

If you just look at object oriented programming, you can easily see how knowing a lot translates to abstract concepts. There's no reason those concepts can't be translated bidirectionally.

sejje · 2025-03-04T23:34:28 1741131268

In fact, a couple years ago they were saying that training models on code translated to better logical thinking in other domains "for free"

raducu · 2025-03-04T20:39:30 1741120770

> The pretraining is the knowledge, not the intelligence.

I thought the knowledge is the training set and the intelligence is the emergent/side effect of reproducing that knowledge by making sure the reproduction is not rote memorisation?

ta8645 · 2025-03-04T21:03:56 1741122236

I'd say that it takes intelligence to encode knowledge, and the more knowledge you have, the more intelligently you can encode further knowledge, in a virtuous cycle. But once you have a data set of knowledge, there's nothing to emerge, there are no side effects. It just sits there doing nothing. The intelligence is in the algorithms that access that encoded knowledge to produce something else.

esafak · 2025-03-04T21:47:31 1741124851

The data set is flawed, noisy, and its pieces are disconnected. It takes intelligence to correct its flaws and connect them parsimoniously.

ta8645 · 2025-03-04T22:06:05 1741125965

It takes knowledge to even know they're flawed, noisy, and disconnected. There's no reason to "correct" anything, unless you have knowledge that applying previously "understood" data has in fact produced deficient results in some application.

That's reinforcement learning -- an algorithm, which requires accurate knowledge acquisition, to be effective.

esafak · 2025-03-04T22:58:04 1741129084

Every statistical machine learning algorithm, including RL, deals with noisy data. The process of fitting aims to remove the sampling noise, revealing the population distribution, thereby compressing it into a model.

The argument being advanced is that intelligence is the proposal of more parsimonious models, aka compression.

ta8645 · 2025-03-04T23:16:33 1741130193

I've lost track of what we're disagreeing about.

godelski · 2025-03-05T02:27:51 1741141671

  > I feel like extensive pretraining goes against the spirit of generality.

What do you mean by generality?

Pretraining is fine. It is even fine for pursuit to AGI. Humans and every animal has "baked in" memory. You're born knowing how to breath and have latent fears (chickens and hawks).

Generalization is the ability to learn on a subset of something and then adapt to the entire (or a much larger portion) of the superset. It's always been that way. Humans do this, right? You learn some addition, subtraction, multiplication, and division and then you can do novel problems you've never seen before that are extremely different. We are extremely general here because we've learned the causal rule set. It isn't just memorization. This is also true for things like physics, and is literally the point of science. Causality is baked into scientific learning. Of course, it is problematic when someone learns a little bit of something and thinks they know way more about it, but unfortunately ego is quite common.

But also, I'm a bit with you. At least with what I think you're getting at. These LLMs are difficult to evaluate because we have no idea what they're trained on and you can't really know what is new, what is a slight variation from the training, and this is even more difficult considering the number of dimensions involved (meaning things may be nearly identical in latent space though they don't appear so to us humans).

I think there's still a lot of ML/AI research that can and SHOULD be done at smaller scales. We should be studying more about this adaptive learning and not just in the RL setting. One major gripe I have with the current research environment is that we are not very scientific when designing experiments. They are highly benchmark/data-set-evaluation focused. Evaluation needs to go far beyond test cases. I'll keep posting this video of Dyson recounting his work being rejected by Fermi[0][1]. You have to have a good "model". What I do not see happening in ML papers is proper variable isolation and evaluation based on this: i.e. hypothesis testing. Most papers I see are not providing substantial evidence for their claims. It may look like this, but the devil is always in the details. When doing extensive hyper-parameter tuning it becomes very difficult to determine if the effect is something you've done via architectural changes, change in the data, change in training techniques, or change in hyperparameters. To do a proper evaluation would require huge ablations with hold-one-out style scores reported. This is obviously too expensive, but the reason it gets messy is because there's a concentration on getting good scores on whatever evaluation dataset is popular. But you can show a method's utility without beating others! This is a huge thing many don't understand. Worse, by changing hyper-parameters to optimize for the test-set result, you are doing information leakage. Anything you change based on the result of the evaluation set, is, by definition, information leakage. We can get into the nitty gritty to prove why this is, but this is just common practice these days. It is the de facto method and yes, I'm bitter about this. (a former physicist who came over to ML because I loved the math and Asimov books)

[0] https://www.youtube.com/watch?v=hV41QEKiMlM

[1] I'd also like to point out that Dyson notes that the work was still __published__. Why? Because it still provides insights and the results are useful to people even if the conclusions are bad. Modern publishing seems to be more focused on novelty and is highly misaligned from scientific progress. Even repetition helps. It is information gain. But repetition results in lower information gain with each iteration. You can't determine correctness by reading a paper, you can only determine correctness by repetition. That's the point of science, right? As I said above? Even negative results are information gain! (sorry, I can rant about this a lot)