I just spent a few days trying to figure out some linear algebra with the help of ChatGPT. It's very useful for finding conceptual information from literature (which for a not-professional-mathematician at least can be really hard to find and decipher). But in the actual math it constantly makes very silly errors. E.g. indexing a vector beyond its dimension, trying to do matrix decomposition for scalars and insisting on multiplying matrices with mismatching dimensions.
O1 is a lot better at spotting its errors than 4o but it too still makes a lot of really stupid mistakes. It seems to be quite far from producing results itself consistently without at least a somewhat clueful human doing hand-holding.
It reliably fails also basic real analysis proofs, but I think this is not too surprising since those require a mix of logic and computation that is likely hard to just infer from statistical likelihood of tokens
LLMs have been very useful for me in explorations of linear algebra, because I can have an idea and say "what's this operation called?" or "how do I go from this thing to that thing?", and it'll give me the mechanism and an explanation, and then I can go read actual human-written literature or documentation on the subject.
It often gets the actual math wrong, but it is good enough at connecting the dots between my layman's intuition and the "right answer" that I can get myself over humps that I'd previously have been hopelessly stuck on.
It does make those mistakes you're talking about very frequently, but once I'm told that the thing I'm trying to do is achievable with the Gram-Schmidt process, I can go self-educate on that further.
The big thing I've had to watch out for is that it'll usually agree that my approach is a good or valid one, even when it turns out not to be. I've learned to ask my questions in the shape of "how do I", rather than "what if I..." or "is it a good idea to...", because most of the time it'll twist itself into shapes to affirm the direction I'm taking rather than challenging and refining it.
When you give it a large math problem and the answer is "seven point one three five ... ", and it shows a plot of the result v some randomly selected ___domain, well there could be more I'd like to know.
You can unlock a full derivation of the solution, for cases where you say "Solve" or "Simplify", but what I (and I suspect GP) might want, is to know why a few of the key steps might work.
It's a fantastic tool that helped get me through my (engineering) grad work, but ultimately the breakthrough inequalities that helped me write some of my best stuff were out of a book I bought in desperation that basically cataloged linear algebra known inequalities and simplifications.
When I try that kind of thing with the best LLM I can use (as of a few months ago, albeit), the results can get incorrect pretty quickly.
> [...], but what I (and I suspect GP) might want, is to know why a few of the key steps might work.
It's been some time since I've used the step-by-step explainer, and it was for calculus or intro physics problems at best, but IIRC the pro subscription will at least mention the method used to solve each step and link to reference materials (e.g., a clickable tag labeled "integration by parts").
Doesn't exactly explain why but does provide useful keywords in a sequence that can be used to derive the why.
Its understanding of problems was very bad last time I used it. Meaning it was difficult to communicate what you wanted it to do. Usually I try to write in the Mathematica language, but even that is not foolproof.
Hopefully they have incorporated more modern LLM since then, but it hasn’t been that long.
Wolfram Alpha's "smartness" is often Clippy level enraging. E.g. it makes assumptions of symbols based on their names (e.g. a is assumed to be a constant, derivatives are taken w.r.t. x). Even with Mathematica syntax it tends to make such assumptions and refuses to lift them even when explicitly directed. Quite often one has to change the variable symbols used to try to make Alpha to do what's meant.
What's surprising to me is that this would surely be in OpenAI's interests, too -- free RLHF!
Of course there would be the risk of adversaries giving bogus feedback, but my gut says it's relatively straightforward to filter out most of this muck.
Wolfram Alpha can solve equations well, but it is terrible at understanding natural language.
For example I asked Wolfram Alpha "How heavy a rocket has to be to launch 5 tons to LEO with a specific impulse of 400s", which is a straightforward application of the Tsiolkovsky rocket equation. Wolfram Alpha gave me some nonsense about particle physics (result: 95 MeV/c^2), GPT-4o did it right (result: 53.45 tons).
Wolfram alpha knows about the Tsiolkovsky rocket equation, it knows about LEO (low earth orbit), but I found no way to get a delta-v out of it, again, more nonsense. It tells me about Delta airlines, mentions satellites that it knows are not in LEO. The "natural language" part is a joke. It is more like an advanced calculator, and for that, it is great.
You know, "You're using it wrong" is usually meant to carry an ironic or sarcastic tone, right?
It dates back to Steve Jobs blaming an iPhone 4 user for "holding it wrong" rather than acknowledging a flawed antenna design that was causing dropped calls. The closest Apple ever came to admitting that it was their problem was when they subsequently ran an employment ad to hire a new antenna engineering lead. Maybe it's time for Wolfram to hire a new language-model lead.
No, “holding it wrong” is the sarcastic version. “You’re using it wrong” is a super common way to tell people they are literally using something wrong.
The problem has always been that you only get good answers if you happen to stumble on a specific question that it can handle. Combining Alpha with an LLM could actually be pretty awesome, but I'm sure it's easier said than done.
Before LLMs exploded nobody really expected WA to perform well at natural language comprehension. The expectations were at the level of "an ELIZA that knows math".
Wolfram Alpha is mostly for "trivia" type problems. Or giving solutions to equations.
I was figuring out some mode decomposition methods such as ESPRIT and Prony and how to potentially extend/customize them. Wolfram Alpha doesn't seem to have a clue about such.
Probably mostly not. The errors tend to be logical/conceptual. E.g. mixing up scalars and matrices is unlikely to be from tokenization. Especially if using spaces between the variables and operators, as AFAIK GPTs don't form tokens over spaces (although tokens may start or end with them).
The only thing I've consistently had issues with while using AI is graphs. If I ask it to put some simple function, it produces a really weird image that has nothing to do with the graph I want. It will be a weird swirl of lines and words, and it never corrects itself no matter what I say to it.
Has anyone had any luck with this? It seems like the only thing that it just can't do.
And works very well - it made me a nice general "draw successively accurate Fourier series approximations given this lambda for coefficients and this lambda for the constant term". PNG output, no real programming errors (I wouldn't remember if it had some stupid error, I'm a python programmer). Even TikZ in LaTeX isn't hopeless (although I did ending up reading the tikz manual)
Ask it to plot the graph with python plotting utilities. Not using its image generator. I think you need a ChatGPT subscription though for it to be able to run python code.
You seem to get 2(?) free Python program runs per week(?) as part of the 01 preview.
When you visit chatgpt on the free account it automatically gives you the best model and then disables it after some amount of work and says to come back later or upgrade.
It was, for a while. I think this is an area where there may have been some regression. It can still write code to solve problems that are a poor fit for the language model, but you may need to ask it to do that explicitly.
The agentic reasoning models should be able to fix this if they have the ability to run code instead of giving each task to itself. "I need to make a graph" "LLMs have difficulty graphing novel functions" "Call python instead" is a line of reasoning I would expect after seeing what O1 has come up with on other problems.
Giving AI the ability to execute code is the safety peoples nightmare though, wonder if we'll hear anything from them as this is surely coming
O1 is a lot better at spotting its errors than 4o but it too still makes a lot of really stupid mistakes. It seems to be quite far from producing results itself consistently without at least a somewhat clueful human doing hand-holding.