Hacker News new | past | comments | ask | show | jobs | submit login

I just spent a few days trying to figure out some linear algebra with the help of ChatGPT. It's very useful for finding conceptual information from literature (which for a not-professional-mathematician at least can be really hard to find and decipher). But in the actual math it constantly makes very silly errors. E.g. indexing a vector beyond its dimension, trying to do matrix decomposition for scalars and insisting on multiplying matrices with mismatching dimensions.

O1 is a lot better at spotting its errors than 4o but it too still makes a lot of really stupid mistakes. It seems to be quite far from producing results itself consistently without at least a somewhat clueful human doing hand-holding.




It reliably fails also basic real analysis proofs, but I think this is not too surprising since those require a mix of logic and computation that is likely hard to just infer from statistical likelihood of tokens


LLMs have been very useful for me in explorations of linear algebra, because I can have an idea and say "what's this operation called?" or "how do I go from this thing to that thing?", and it'll give me the mechanism and an explanation, and then I can go read actual human-written literature or documentation on the subject.

It often gets the actual math wrong, but it is good enough at connecting the dots between my layman's intuition and the "right answer" that I can get myself over humps that I'd previously have been hopelessly stuck on.

It does make those mistakes you're talking about very frequently, but once I'm told that the thing I'm trying to do is achievable with the Gram-Schmidt process, I can go self-educate on that further.

The big thing I've had to watch out for is that it'll usually agree that my approach is a good or valid one, even when it turns out not to be. I've learned to ask my questions in the shape of "how do I", rather than "what if I..." or "is it a good idea to...", because most of the time it'll twist itself into shapes to affirm the direction I'm taking rather than challenging and refining it.


Very close to my experience.


Isn't Wolfram Alpha a better "ChatGPT of Math"?


Wolfram Alpha is better at actually doing math, but far worse at explaining what it’s doing, and why.


What’s worse about it?

It never tells you the wrong thing, at the very least.


When you give it a large math problem and the answer is "seven point one three five ... ", and it shows a plot of the result v some randomly selected ___domain, well there could be more I'd like to know.

You can unlock a full derivation of the solution, for cases where you say "Solve" or "Simplify", but what I (and I suspect GP) might want, is to know why a few of the key steps might work.

It's a fantastic tool that helped get me through my (engineering) grad work, but ultimately the breakthrough inequalities that helped me write some of my best stuff were out of a book I bought in desperation that basically cataloged linear algebra known inequalities and simplifications.

When I try that kind of thing with the best LLM I can use (as of a few months ago, albeit), the results can get incorrect pretty quickly.


> [...], but what I (and I suspect GP) might want, is to know why a few of the key steps might work.

It's been some time since I've used the step-by-step explainer, and it was for calculus or intro physics problems at best, but IIRC the pro subscription will at least mention the method used to solve each step and link to reference materials (e.g., a clickable tag labeled "integration by parts"). Doesn't exactly explain why but does provide useful keywords in a sequence that can be used to derive the why.


What book was it that you found helpful?


A survey on matrix theory and matrix inequalities - Marvin Marcus

https://www.amazon.com/gp/product/048667102X/ref=ppx_yo_dt_b...


Im reviewing linear algebra now and would also love to know that book!


It was this one (in case you miss sibling response): https://www.amazon.com/gp/product/048667102X/ref=ppx_yo_dt_b...

I make no claim about its usefulness for anyone else!


Its understanding of problems was very bad last time I used it. Meaning it was difficult to communicate what you wanted it to do. Usually I try to write in the Mathematica language, but even that is not foolproof.

Hopefully they have incorporated more modern LLM since then, but it hasn’t been that long.


Wolfram Alpha's "smartness" is often Clippy level enraging. E.g. it makes assumptions of symbols based on their names (e.g. a is assumed to be a constant, derivatives are taken w.r.t. x). Even with Mathematica syntax it tends to make such assumptions and refuses to lift them even when explicitly directed. Quite often one has to change the variable symbols used to try to make Alpha to do what's meant.


I wish there was a way to tell Chatgpt where it has made a mistake, with a single mouse click.


What's surprising to me is that this would surely be in OpenAI's interests, too -- free RLHF!

Of course there would be the risk of adversaries giving bogus feedback, but my gut says it's relatively straightforward to filter out most of this muck.


Is the explanation a pro feature? At the very end it says "step by step? Pay here"


Wolfram Alpha can solve equations well, but it is terrible at understanding natural language.

For example I asked Wolfram Alpha "How heavy a rocket has to be to launch 5 tons to LEO with a specific impulse of 400s", which is a straightforward application of the Tsiolkovsky rocket equation. Wolfram Alpha gave me some nonsense about particle physics (result: 95 MeV/c^2), GPT-4o did it right (result: 53.45 tons).

Wolfram alpha knows about the Tsiolkovsky rocket equation, it knows about LEO (low earth orbit), but I found no way to get a delta-v out of it, again, more nonsense. It tells me about Delta airlines, mentions satellites that it knows are not in LEO. The "natural language" part is a joke. It is more like an advanced calculator, and for that, it is great.


You're using it wrong, you can use natural language in your equation, but afaik it's not supposed to be able to do what you're asking of it.


You know, "You're using it wrong" is usually meant to carry an ironic or sarcastic tone, right?

It dates back to Steve Jobs blaming an iPhone 4 user for "holding it wrong" rather than acknowledging a flawed antenna design that was causing dropped calls. The closest Apple ever came to admitting that it was their problem was when they subsequently ran an employment ad to hire a new antenna engineering lead. Maybe it's time for Wolfram to hire a new language-model lead.


No, “holding it wrong” is the sarcastic version. “You’re using it wrong” is a super common way to tell people they are literally using something wrong.


But they're not using it wrong. They are using it as advertised by Wolfram themselves (read: himself).

The GP's rocket equation question is exactly the sort of use case for which Alpha has been touted for years.


It's not an LLM. You're simply asking too much of it. It doesn't work the way you want it to, sorry.


Tell Wolfram. They're the ones who've been advertising it for years, well before LLMs were a thing, using English-language prompts like these examples: https://www.pcmag.com/news/23-cool-non-math-things-you-can-d...

The problem has always been that you only get good answers if you happen to stumble on a specific question that it can handle. Combining Alpha with an LLM could actually be pretty awesome, but I'm sure it's easier said than done.


Before LLMs exploded nobody really expected WA to perform well at natural language comprehension. The expectations were at the level of "an ELIZA that knows math".


Correct, so it isn't a "ChatGPT of Math", which was the point.


Wolfram Alpha is mostly for "trivia" type problems. Or giving solutions to equations.

I was figuring out some mode decomposition methods such as ESPRIT and Prony and how to potentially extend/customize them. Wolfram Alpha doesn't seem to have a clue about such.


No. Wolfram Alpha can't solve anything that isn't a function evaluation or equation. And it can't do modular arithmetic to save its unlife.

WolframOne/Mathematica is better, but that requires the user (or ChatGPT!)to write complicated code, not natural language queries.


I wonder if these are tokenization issues? I really am curious about metas byte tokenization scheme...


Probably mostly not. The errors tend to be logical/conceptual. E.g. mixing up scalars and matrices is unlikely to be from tokenization. Especially if using spaces between the variables and operators, as AFAIK GPTs don't form tokens over spaces (although tokens may start or end with them).


The only thing I've consistently had issues with while using AI is graphs. If I ask it to put some simple function, it produces a really weird image that has nothing to do with the graph I want. It will be a weird swirl of lines and words, and it never corrects itself no matter what I say to it.

Has anyone had any luck with this? It seems like the only thing that it just can't do.


You're doing it wrong. It can't produce proper graphs with it's diffusion style image generation.

Ask it to produce graphs with python and matplotlib. That will work.


And works very well - it made me a nice general "draw successively accurate Fourier series approximations given this lambda for coefficients and this lambda for the constant term". PNG output, no real programming errors (I wouldn't remember if it had some stupid error, I'm a python programmer). Even TikZ in LaTeX isn't hopeless (although I did ending up reading the tikz manual)


Ask it to plot the graph with python plotting utilities. Not using its image generator. I think you need a ChatGPT subscription though for it to be able to run python code.


You seem to get 2(?) free Python program runs per week(?) as part of the 01 preview.

When you visit chatgpt on the free account it automatically gives you the best model and then disables it after some amount of work and says to come back later or upgrade.


Just install Python locally, and copy paste the code.


Shouldn’t ChatGPT be smart enough to know to do this automatically, based on context?


It was, for a while. I think this is an area where there may have been some regression. It can still write code to solve problems that are a poor fit for the language model, but you may need to ask it to do that explicitly.


The agentic reasoning models should be able to fix this if they have the ability to run code instead of giving each task to itself. "I need to make a graph" "LLMs have difficulty graphing novel functions" "Call python instead" is a line of reasoning I would expect after seeing what O1 has come up with on other problems.

Giving AI the ability to execute code is the safety peoples nightmare though, wonder if we'll hear anything from them as this is surely coming


Don't most mathematical papers contain at least one such error?


Where is this data from?


It's a question, and to be fair to AI it should actually refer to papers before review.


Yes, it's a question, but you haven't answered what you read that makes you suspect so.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: