CoTInformation: Improved Sample Complexity under Chain-of-Thought Supervision
–Neural Information Processing Systems
Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chainof-thought (CoT) supervision, which augments training data with intermediate reasoning steps to provide a richer learning signal, has driven recent advances in large language model reasoning. This paper develops a statistical theory of learning under CoT supervision. Central to the theory is the CoT information, which measures the additional discriminative power offered by the chain-of-thought for distinguishing hypotheses with different end-to-end behaviors. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard end-to-end supervision, with both upper bounds and information-theoretic lower bounds characterized by the CoT information.
Neural Information Processing Systems
Jun-15-2026, 16:01:48 GMT
- Country:
- North America > United States > California (0.45)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.87)
- Research Report
- Technology: