He probably didn't need petabytes of reddit posts and millions of gpu-hours to parrot that though.
I still don't buy the "we do the same as LLMs" discourse. Of course one could hypothesize the human brain language center may have some similarities to LLMs, but the differences in resource usage and how those resources are used to train humans and LLMs are remarkable and may indicate otherwise.
>Not text, he had petabytes of video, audio, and other sensory inputs. Heck, a baby sees petabytes of video before first word is spoken
A 2-3 year old baby could speak in a rural village in 1800, having just seen its cradle (for the first month/s), and its parents' hut for some more months, and maybe parts of the village afterwards.
Hardly "petabytes of training video" to write home about.
you are funny. Clearly your expertise with babies comes from reading books about history or science, rather than ever having interacted with one…
What resolution of screen do you think you would need to not distinguish from reality? For me personally i very conservatively estimate it to be on above OOM of 10 4k screens by 10, meaning 100k screens. If a typical 2h 4k is ~50gb uncompressed, that gives us about half a petabyte per 24h (even with eyes closed). Just raw unlabeled vision data.
Probably a baby has a significantly lower resolution, but then again what is the resolution from the skin and other organs?
So yes, petabytes of data within the first days of existence - well, likely before even being born since baby can hear inside the uterus, for example.
And very high signal data, as you’ve stated yourself (nothing to write home about) mainly seeing mom and dad, as well as from a feedback loop POV - a baby never tells you it is hungry subtly.
No, they don’t - they don’t have the hardware, yet. But they do parrot sensory output to eg muscles that induce the expected video sensory inputs in response, in a way that mimics the video input of “other people doing things”.
And yet with multiple OoM more data he still didn't cost millions of dollars to be trained nor multiple lifetimes in gpu-hours. He probably didn't even register all the petabytes passing through all his "sensors", those are some characteristics that we are not even near understanding and much less replicating.
Whatever is happening in the brain is more complex as the perf/cost ratio is stupidly better for humans for a lot of tasks in both training and inference*.
*when considering all modalities, o3 can't even do the ARC AGI in vision mode but rather just json representations. So much for omni.
>Everything you said is parroting data you’ve trained on
"Just like" an LLM, yeah sure...
Like how the brain was "just like" a hydraulic system (early industrial era), like a clockwork with gears and differentiation (mechanical engineering), "just like" an electric circuit (Edison's time), "just like" a computer CPU (21st century), and so on...