Hacker News new | past | comments | ask | show | jobs | submit login

a multimodal llm is a general purpose device to churn sensor inputs into a sequence of close to optimal decisions. the 'language' part is there to reduce the friction of the interface with humans, it's not an inherent limitation of the llm. not too farfetched to imagine a scenario where you point to a guy in a crowd and tell a drone to go get him, and the drone figures out a close to optimal sequence of decisions to make it so.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: