Understanding Chain-of-Thought in LLMs through Information Theory

Ton, Jean-Francois, Taufiq, Muhammad Faaiz, Liu, Yang

Nov-18-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, from complex reasoning to code generation [Chowdhery et al., 2024, OpenAI et al., 2024, Bubeck et al., 2023, Anil et al., 2023]. Many of these advances can be attributed to Chain-of-Thought (CoT) reasoning [Wei et al., 2024, Nye et al., 2021, Li et al., 2024], which involves breaking down complex problems into a series of intermediate steps, mirroring human-like reasoning processes. The success of CoT reasoning, particularly in domains such as mathematics, logic, and multi-step decision-making, has led researchers and developers to incorporate CoT-like features directly into model training, i.e. the FLAN family of models [Chung et al., 2022, Wei et al., 2022]. This paper introduces a new formal framework for analyzing CoT in LLMs. We provide a rigorous method grounded in information theory, to evaluate the quality of each step in a model's reasoning process, thus offering insights beyond simple accuracy metrics to identify areas for improvement.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Nov-18-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)
- Workflow (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)