Hacker News new | past | comments | ask | show | jobs | submit login

How could one hour of video fit in 1M tokens? 1 hour at 30fps is 3600*30=100k frames. Each frame is converted in 256 tokens. So either they are not processing each frame, or each frame is converted into fewer tokens.



The model can probably perform fine at 1 frame per second (3600*256=921600 tokens), and they could probably use some sort of compression.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: