KaVa: Latent Reasoning via Compressed KV-Cache Distillation

Kuzina, Anna, Pioro, Maciej, Whatmough, Paul N., Bejnordi, Babak Ehteshami

arXiv.org Artificial Intelligence 

Large Language Models (LLMs) excel at multi-step reasoning problems with explicit chain-of-thought (CoT), but verbose traces incur significant computational costs and memory overhead, and often carry redundant, stylistic artifacts. Latent reasoning has emerged as an efficient alternative that internalizes the thought process, but it suffers from a critical lack of supervision, limiting its effectiveness on complex, natural-language reasoning traces. We show that the abstract, unstructured knowledge within compressed KV -cache, which lacks direct token correspondence, can serve as a rich supervisory signal for a latent reasoning student. Empirically, the approach consistently outperforms strong latent baselines, exhibits markedly smaller degradation from equation-only to natural-language traces, and scales to larger backbones while preserving efficiency. These results establish compressed KV -cache distillation as a scalable supervision signal for latent reasoning, combining the accuracy of CoT -trained teachers with the efficiency and deployability of latent inference. Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in solving complex problems across domains such as mathematics (Zhang et al., 2025), science (Phan et al., 2025), and code generation (Hui et al., 2024). A key driver of this progress is "chain-of-thought" (CoT) training that elicits intermediate steps before the final answer, improving accuracy on long-horizon inference problems (DeepSeek-AI et al., 2025). Y et, explicit CoT often incurs substantial inference cost due to long, verbose traces and the associated key-value (KV) cache growth, making deployment on memory-and compute-constrained devices difficult. Furthermore, CoT traces, especially those distilled from larger models, can inherit and amplify biases or contain plausible-sounding but fallacious logic, limiting their reliability.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found