My guess is that those questions are very typical and follow very normal patterns and use well established processes. Give it something weird and it'll continuously trip over itself.
My current project is nothing too bizarre, it's a 3D renderer. Well-trodden ground. But my project breaks a lot of core assumptions and common conventions, and so any LLM I try to introduce—Gemini 2.5 Pro, Claude 3.7 Thinking, o3—they all tangle themselves up between what's actually in the codebase and the strong pull of what's in the training data.
I tried layering on reminders and guidance in the prompting, but ultimately I just end up narrowing its view, limiting its insight, and removing even the context that this is a 3D renderer and not just pure geometry.
> Give it something weird and it'll continuously trip over itself.
And so will almost all humans. It's weird how people refuse to ascribe any human-level intelligence to it until it starts to compete with the world top elite.
Yeah, but humans can be made to understand when and how they're wrong and narrow their focus to fixing the mistake.
LLMs apologize and then proudly present the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning.
> LLMs apologize and then proudly present the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning.
So basically like a human, at least up to young adult years in teaching context[0], where the student is subject to authority of the teacher (parent, tutor, schoolteacher) and can't easily weasel out of the entire exercise. Yes, even young adults will get stuck in a loop, presenting "the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning", or at least until something clicks, or they give up in shame (or the teacher does).
As someone currently engaged in teaching the Adobe suite to high school students, that doesn't track with what I see. When my students are getting stuck and frustrated, I look at the problem, remind them of the constraints and assumptions the software operates under. Almost always they realize the problem without me spelling it out, and they reinforce the mental model of the software they're building. Often noticing me lurking and about to offer help is enough for them to pause, re-evaluate, and catch the error in their thinking before I can get out a full sentence.
Reminding LLMs of the constraints they're bumping into doesn't help. They haven't forgotten, after all. The best performance I got out of the LLMs in my project I mentioned upthread was a loop of trying out different functions, pausing, re-evaluating, realizing in its chain of thought that it didn't fit the constraints, and trying out a slightly different way of phrasing the exact same approach. Humans will stop slamming their head into a wall eventually. I sat there watching Gemini 2.5 Pro internally spew out maybe 10 variations of the same function before I pulled the tokens it was chewing on out of its mouth.
Yes, sometimes students get frustrated and bail, but they have the capacity to learn and try something new. If you fall into an area that's adjacent to but decidedly not in their training data, the LLMs will feel that pull from the training data too strongly and fall right into that rut, forgetting where they're at.
A human can play tictactoe or any other simple game in a few minutes after being described the game. AI will do all kinds on interesting things that either are against the rules or will be extremely poor choices.
Yeah, I tried playing tictactoe with chatGPT and it did not do well.
LLMs struggle with context windows, so as long as the problem can be solved in their small windows, they do great.
Humans neural networks are constantly being retrained, so their effective context window is huge. The LLM may be better at a complex, well specified 200 line python program, but the human brain is better at the 1M line real-world application. It takes some study though.
yeah, well it’s also one of the top scorers on the Math olympiads