MKBHD's review of the new Sora release: https://www.youtube.com/watch?v=OY2x0TyK...

laweijfmvo · 2024-12-09T18:12:53 1733767973

Love the callout of them definitely training on his own videos

danpalmer · 2024-12-09T23:20:10 1733786410

...which they shouldn't have been able to get? I had thought that it was against the YouTube ToS? (my personal understanding, unrelated to my employer)

Havoc · 2024-12-10T01:21:53 1733793713

AI companies don’t give a shit about ToS. Hell most of the big players actively ignored copyright entirely in bulk. See thousand upon thousands of pirated books in the pile dataset.

And right after that news broke they “fixed” the problem by stopping to disclose training data sources. Thats why early models had papers eg Llama 1 listed this and now nobody does. It’s just an unspoken yet open secret now.

potamic · 2024-12-10T10:27:51 1733826471

How did they get access to pirated books?

leobg · 2024-12-14T18:12:41 1734199961

Anna’s archive has files specifically for training LLMs. But I’d guess the big players secured their share beforehand, by scraping those sites. I have zero proof, it’s just a guess.

aprilthird2021 · 2024-12-10T07:14:55 1733814895

The companies are fairly brazen, at least internally, about just scraping whatever, wherever and not caring about ToS of any website. All they really care about is blocking "bad" data that might make the models racist or sexual, etc.

paxys · 2024-12-10T03:28:25 1733801305

If AI companies respected ToS there would be no AI

awongh · 2024-12-09T18:25:23 1733768723

Interesting to see how bad the physics/object permanence is. I wonder if combining this with a Genie 2 type model (Google's new "world model") would be the next step in refining it's capabilities.

torginus · 2024-12-09T18:57:49 1733770669

This feels like computer graphics and the 'screen space' techniques that got introduced in the Xbox 360 generation - reflection, shadows etc. all suffered from the inability to work with off screen information and gave wildly bad answers once off screen info was required.

The solution was simple - just maintain the information in world space, and sample for that. But simple does not mean cheap, and it led to a ton of redundant (as in invisible in the final image) having to be kept track of.

kranke155 · 2024-12-09T18:31:59 1733769119

Until these models can figure out physics, it seems to me they will be an interesting toy

andybak · 2024-12-09T18:41:35 1733769695

They can figure out a fair bit of physics. It's not a "no physics" vs "physics" thing. Rather it's a "flawed and unreliable physics" thing.

It's similar to the LLM hallucination problem. LLMs produce nonsense and untruths - but they are still useful in many domains.

Barrin92 · 2024-12-09T19:16:56 1733771816

It's a pretty binary thing in the sense that "bad physics" pretty quickly decoheres into no physics.

I saw one of these models doing a Minecraft like simulation and it looked sort of okay but then water started to end up in impossible places and once it was there it kept spreading and you ended up in some lovecraftian horror dimension. Any useful physics simluation at least needs boundary conditions to hold and these models have no boundary conditions because they have no clear categories of anything.

kylehotchkiss · 2024-12-09T21:54:42 1733781282

But they don't, they just understand pixel relationships (right?)

potatoman22 · 2024-12-10T02:56:49 1733799409

You can model a lot of basic physics through observing 1,000,000 videos

kranke155 · 2024-12-12T11:53:19 1734004399

Here’s an idea - what if the fact that we have a body that has weight and consequence helps us understand physics? What if just visual data won’t get there because visual data lacks the sense of self? Could be interesting

kranke155 · 2024-12-12T11:49:54 1734004194

Not consistently though. I think some model of understanding of physics is emergent but it doesn’t seem emergent enough. The model doesn’t understand object permanence either.

replwoacause · 2024-12-10T15:06:15 1733843175

I quit watching this guy after he filmed himself speeding a 100 mph through residential. Just another privileged YouTuber.