I thought this was already obvious. It's all just about the awareness of context...

nicklecompte · on March 22, 2024

Nothing about what you’re saying actually deals with this non-obvious limitation of chain-of-thought:

> Examples like this suggest that transformers wouldn’t gain much from using just a few intermediate steps. Indeed, Merrill and Sabharwal proved that chain of thought only really begins to help when the number of intermediate steps grows in proportion to the size of the input, and many problems require the number of intermediate steps to grow much larger still.

This aligns with my experience: GPT-4 can only break down “simple” problems when prompted to solve step-by-step. In particular, if the actual steps need to be broken down further (O(n^2) complexity), GPT-4 can’t handle it reliably - it will break a tasks into steps but it struggles to break subtasks into substeps even if it otherwise can solve the subtask with CoT prompting.

CoT prompting works for simple O(n) computations because it prevents LLMs from blindly guessing the answer, but they are theoretically (and IMO empirically) incapable of breaking any O(n^2) problem down into O(n) separate O(n) subproblems. Needless to say humans are quite a bit smarter than that. (so are mice!)

Folcon · on March 22, 2024

This is fascinating, do you have any more details or things that I could look at to explore this further?

Even an actual example would be helpful!

MrYellowP · on March 25, 2024

Nah, this was all many months ago. You'll have to do the experimenting on your own.