100% of the time when I post a critique someone replies with this. I tell them I've used literally every LLM under the sun quite a bit to find any use I can think of and then it's immediately crickets.
RT-2 is a vision language model fine tuned on the current vision input and actuator positions as the output. Google uses a bunch of TPUs to produce a full response at a cycle rate of 3 Hz and the VLM has learned the kinematics of the robot and knows how to pick up objects according to given instructions.
Given the current rate of progress, we will have robots that can learn simple manual labor from human demonstrations (e.g. Youtube as a dataset, no I do not mean bimanual teleoperation) by the end of the decade.
Usually when I encounter sentiment like this it is because they only have used 3.5 (evidently not the case here) or that their prompting is terrible/misguided.
When I show a lot of people GPT4 or Claude, some percentage of them jump right to "What year did Nixon get elected?" or "How tall is Barack Obama?" and then kind of shrug with a "Yeah, Siri could do that ten years ago" take.
Beyond that you have people who prompt things like "Make a stock market program that has tabs for stocks, and shows prices" or "How do you make web cookies". Prompts that even a human would struggle greatly with.
For the record, I use GPT4 and Claude, and both have dramatically boosted my output at work. They are powerful tools, you just have to get used to massaging good output from them.
That is not the reality today. If you want good results from an LLM, then you do need to speak LLM. Just because they appear to speak English doesn't mean they act like a human would.
People don’t even know how to use traditional web search properly.
Here’s a real scenario: A Citrix virtual desktop crashed because a recent critical security fix forced an upgrade of a shared DLL. The output is a really specific set of errors in a stack trace. I watched with my own two eyes an IT professional typed the following phrase into Google: “Why did my PC crash?”
Then he sat there and started reading through each result… including blog posts by random kids complaining about Windows XP.
I wish I could say this kind of thing is an isolated incident.
I mean, you need to speak German to talk to a German. It’s not really much different for LLM, just because the language they speak has a root in English doesn’t mean it actually is English.
And even if it was, there’s plenty of people completely unintelligible in English too…
You see no difference between non-RLHFed GPT3 from early 2022 and GPT-4 in 2024? It's a very broad consensus that there is a huge difference so that's why I wanted to clarify and make sure you were comparing the right things.
What type of usages are you testing? For general knowledge it hallucinates way less often, and for reasoning and coding and modifying its past code based on English instructions it is way, way better than GPT-3 in my experience.
It's fine, you don't have a use for it so you don't care. I personally don't spend any effort getting to know things that I don't care about and have no use for; but I also don't tell people who use tools for their job or hobby that I don't need how much those tools are useless and how their experience using them is distorted or wrong.
Usually people who post such claims haven’t used anything beyond gpt3. That’s why you get questions.
Also, the difference is so big and so plainly visible that I guess people don’t know how to even answer someone saying they don’t see it. That’s why you get crickets.