Hacker News new | past | comments | ask | show | jobs | submit login

I've been working on tool calling in llama.cpp for Phi-4 and have a client that can switch between local models and remote for agentic work/search/etc., I learned a lot about this situation recently:

- We can constrain the output of a JSON grammar (old school llama.cpp)

- We can format inputs to make sure it matches the model format.

- Both of these combined is what llama.cpp does, via @ochafik, in inter alia, https://github.com/ggml-org/llama.cpp/pull/9639.

- ollama isn't plugged into this system AFAIK

To OP's question, specifying a format in the model unlocks training the model specifically had on functions calling: what I sometimes call an "agentic loop", i.e. we're dramatically increasing the odds we're singing in the right tune for the model to do the right thing in this situation.




Do you have thoughts on the code-style agents recommended by huggingface? The pitch for them is compelling, since structuring complex tasks in code is something very natural for LLMs. But then, I don’t see as much about this approach outside of HF.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: