Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Xiang, Violet, Snell, Charlie, Gandhi, Kanishk, Albalak, Alon, Singh, Anikait, Blagden, Chase, Phung, Duy, Rafailov, Rafael, Lile, Nathan, Mahan, Dakota, Castricato, Louis, Franken, Jan-Philipp, Haber, Nick, Finn, Chelsea

Jan-8-2025–arXiv.org Artificial Intelligence

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT. We present empirical evidence from state-of-the-art models exhibiting behaviors consistent with in-context search, and explore methods for producing Meta-CoT via process supervision, synthetic data generation, and search algorithms. Finally, we outline a concrete pipeline for training a model to produce Meta-CoTs, incorporating instruction tuning with linearized search traces and reinforcement learning post-training. Finally, we discuss open research questions, including scaling laws, verifier roles, and the potential for discovering novel reasoning algorithms. This work provides a theoretical and practical roadmap to enable Meta-CoT in LLMs, paving the way for more powerful and human-like reasoning in artificial intelligence.

large language model, machine learning, polynomial, (20 more...)

arXiv.org Artificial Intelligence

Jan-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.14)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education > Educational Setting
  - Online (0.47)
- Leisure & Entertainment (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Representation & Reasoning > Search (1.00)