This problem of non-technical product folks over-promising features is going to get much worse in the age of LLMs. The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more. This long-tail pain isn't new, but the ability for non-technical folks to poke a model on a chat site and then assume their idea is ready to be deployed to production is a more recent phenomenon.
> The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap
That's not just with LLMs. This has been an Achille's heel of demos for decades.
It's usually quite easy to make a demo of something sexy, but making it ship, is a very, very different thing.
This is made exponentially worse, by managers thinking that the demo means "we're almost ready," and even by the engineers, themselves, who are often not experienced enough to understand that demo != ship. Also, it has been my experience, that researchers that come up with amazing ideas, have absolutely no appreciation for what it takes to turn their ideas into ship product. Since Apple is all about ship product, they have to bridge that gulf. They are trying to commoditize something that is mostly buzz and raw research, right now.
I'm really pretty jaded about ML and LLMs in general, right now. I feel as if a whole bunch of really sexy demos were released, creating a huge level of buzz, but that the shipping product is still a ways off.
I don’t disagree. I do think there will be a tendency to say, “We can do X using AI.” When X can happen, but it isn’t guaranteed to happen by the system.
Here, it doesn’t sound like the features promised were truly demo-able, and when they were they were “not working properly up to a third of the time”.
Having a random CRUD MVP that is 2/3rds done is different than having a SOTA LLM implementation only being 2/3rds reliable. It is a vastly different problem to get from there to the finish line.
But I think marketing types would be equally likely to make promises in both scenarios.
I’m getting this sense too, as my own employer is starting to add AI features. The pitches for what is possible are not grounded in what engineers know the system can do, they’re just assumptions about “if we give the LLM these inputs, we’d expect to see these outputs that are accurate enough for meaningful productivity gains.”
I’m not an AI skeptic, but it’ll be interesting to see how we manage the uncertainty in these projects.