It's all just about the awareness of contexts. Want to improve it? Simply add a term to the prompt to unlock more considerations. Assuming we've not reached the edge of the context window, every new word "unlocks" new vectors with more context the language models adds to the considerations.
The similarity with how the human brain (seems to) works is so remarkable, it doesn't even make sense not to use it as an analogue for how to better use language models.
When the results (same way of manipulating an LLM as manipulating a human brain ... using the right words) can be achieved the same way, why believe there's a difference?
This is stuff one can learn over time by using/researching 3B models. While most people seem to shun them, some of them are extremely powerfull, like the "old" orca mini 3B. I am still using that one! All they really need is better prompts and that approach works perfectly fine.
The biggest hurdle I've found is the usually small context window of such small models, but there's ways of cheating around that without sacrificing too much of the quality using small rope extension, summarizing text, adding context words or leaving out letters of words in the prompt, virtually increasing the size of the context window.
If you want to improve the results of your language model, you should become a mentalist/con-man/magician/social engineer. It sounds weird, but it works!
Nothing about what you’re saying actually deals with this non-obvious limitation of chain-of-thought:
> Examples like this suggest that transformers wouldn’t gain much from using just a few intermediate steps. Indeed, Merrill and Sabharwal proved that chain of thought only really begins to help when the number of intermediate steps grows in proportion to the size of the input, and many problems require the number of intermediate steps to grow much larger still.
This aligns with my experience: GPT-4 can only break down “simple” problems when prompted to solve step-by-step. In particular, if the actual steps need to be broken down further (O(n^2) complexity), GPT-4 can’t handle it reliably - it will break a tasks into steps but it struggles to break subtasks into substeps even if it otherwise can solve the subtask with CoT prompting.
CoT prompting works for simple O(n) computations because it prevents LLMs from blindly guessing the answer, but they are theoretically (and IMO empirically) incapable of breaking any O(n^2) problem down into O(n) separate O(n) subproblems. Needless to say humans are quite a bit smarter than that. (so are mice!)
It's all just about the awareness of contexts. Want to improve it? Simply add a term to the prompt to unlock more considerations. Assuming we've not reached the edge of the context window, every new word "unlocks" new vectors with more context the language models adds to the considerations.
The similarity with how the human brain (seems to) works is so remarkable, it doesn't even make sense not to use it as an analogue for how to better use language models.
When the results (same way of manipulating an LLM as manipulating a human brain ... using the right words) can be achieved the same way, why believe there's a difference?
This is stuff one can learn over time by using/researching 3B models. While most people seem to shun them, some of them are extremely powerfull, like the "old" orca mini 3B. I am still using that one! All they really need is better prompts and that approach works perfectly fine.
The biggest hurdle I've found is the usually small context window of such small models, but there's ways of cheating around that without sacrificing too much of the quality using small rope extension, summarizing text, adding context words or leaving out letters of words in the prompt, virtually increasing the size of the context window.
If you want to improve the results of your language model, you should become a mentalist/con-man/magician/social engineer. It sounds weird, but it works!