I went back and used your prompt, and it is still looping: https://pastebin.com/...

anon373839 · 2025-05-02T06:22:32 1746166952

Are you using Ollama? If so, the issue may be Ollama's default context length: just 2,048 tokens. Ollama truncates the rest of the context silently, so "thinking" models cannot work with the default settings.

If you are using Ollama, try explicitly setting the `num_ctx` parameter in your request to something higher like 16k or 32k, and then see if you still encounter the looping. I haven't run into that behavior once with this model.

rcarmo · 2025-05-02T07:38:35 1746171515

I was using the CLI (which is where I live), but I will redownload and give it a try.