Hacker News new | past | comments | ask | show | jobs | submit login
Toolformer: Language models can teach themselves to use tools (arxiv.org)
155 points by azhenley on Feb 12, 2023 | hide | past | favorite | 18 comments



I think of those cases where people ask GPT-3 to write a Python program. For instance one person had the problem that they'd ask GPT-3 to answer a question like "How many prime numbers are there between 14 and 321?" and get an almost right answer. A better route to a right answer in that case is to have it write a Python program and run it because the only way to answer that right is to count them all and Python is good for that.


Current LLMs can’t reason like that. If their training text contained a question and response similar to the one they are asked, those will have been encoded in their network weightings, and they will produce an answer of a similar form but randomly varying in the details.

This is why they produced answers to maths questions in the right form, but wrong in detail. It’s why when asked when and where a famous person was born, they will often respond with an answer that looks highly plausible, but wrong in the details.

What this mechanism might do though is enable it to identify this as a question best answered by one of it’s available APIs. That API might be to an LLM trained to write code to solve maths problems, so that will write the code, run it and generate a response, then pass that back through the API.

So this way you could build a hierarchy of problem solvers, some of them procedural, some of them ___domain specific LLMs, or problem solvers that themselves use LLMs as components to help solve that problem ___domain.

Interesting times.


You can build stuff like this today, there is also the ability to use bash, python and requests lib under docs for utility. https://langchain.readthedocs.io/en/latest/modules/agents/ge...


What they're describing in the paper goes much further than LangChain. Toolformer is integrating API calls into the training and inference of an LLM.

Short video explainer here: https://youtu.be/ZCdqfIuT81A


They used GPT-J, an open-source LLM, for this. Anyone can use this technique privately to integrate with their proprietary / private APIs too.



Thanks. I submitted this two days ago so it must have been put in the second chance queue.


I think this is probably the sort of way AGI will work.


A multimodal LLM using tools definitely feels like the closest we’ve ever been to AGI.


If we’re at a place a LLM can make and use tools I’m excited about the limits of its ability to create tools... specifically to create new tools.

For me that kind of entailment is one of the more exciting pillars of general intelligence and unique among humans (I think?).


The next step could be pairing multiple LLMs with code synthesisers. The LLM could describe the target API, a second LLM would build said API, and resolve the query.

I think at that point we will be very close to an AGI.


To me what we are seeing is what a failure the field of the philosophy of artificial intelligence has been to prepare us linguistically to have discussions about all this.

Even when we have real AGI, some people will argue we don't because it can't cook a hamburger. Then even if we make a robot AI arm to cook a hamburger, people will complain the arm can't play the guitar.


What will the people complain about at the end?


Could this be applied to physical tools, such as audio sensors or motorized manipulater arms?


If cats had opposable thumbs, we'd all be dead. Toolformer is in that vein.


[flagged]


Lee, please, less blatant the ads. Thank ye


What does: "100 requests a month free tier. $0.01 USD per request" mean?

IS it free or not?


cute




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: