It's not the job of the LLM to run the code... if you ask it to run the code, it...

It's not the job of the LLM to run the code... if you ask it to run the code, it will just do its best approximation at giving you a result similar to what the code seems to be doing. It's not actually running it.

Just like Dall-E is not layering coats of pain to make a watercolor... it just makes something that looks like one.

Your LLM (or you) should run the code in a code interpretor. Which ChatGPT did because it has access to tools. Your local ones don't.