Large language models (LLMs) are restricted to reason in the “language space”, where they typically
express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem.
However, we argue that language space may not always be optimal for reasoning. For example, most
word tokens are primarily for textual coherence and not essential for reasoning, while some critical
tokens require complex planning and pose huge challenges to LLMs. To explore the potential of
LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new
paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM
as a representation of the reasoning state (termed “continuous thought”). Rather than decoding this
into a word token, we feed it back to the LLM as the subsequent input embedding directly in the
continuous space...
https://arxiv.org/pdf/2412.06769