From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Wen, Kaiyue, Zhang, Huaqing, Lin, Hongzhou, Zhang, Jingzhao

Oct-7-2024–arXiv.org Machine Learning

Chain-of-thought (CoT) has proven to be a powerful technique for enhancing reasoning in large language models [29, 63]. By instructing the model to break complex problems into smaller, manageable steps, CoT facilitates more efficient reasoning and better generalization, particularly in algorithmic and logical tasks [32, 45, 60]. Building on this, performance can be further improved through multi-step prompting and multi-path sampling techniques [10, 20, 59, 74, 75]. This focus on CoT within in-context learning has since expanded to more structured learning approaches [6, 69]. By adding reasoning examples of CoT style to the instruction-tuning dataset, models enhance their problem-solving abilities more effectively than relying solely on CoT during prompting [11, 72]. As a result, CoT is now shaping a new paradigm in language model development, marking a shift from simply scaling data [22, 25] to focusing on advanced reasoning strategies [39], which leads to more effective learning outcomes. While CoT's success is well-established, understanding why it works is still a hotly debated topic [48, 51]. Recent theoretical studies suggest that CoT enhances a model's expressiveness, increasing its representational capacity when the sequence is long enough [18, 37]. However, expressivity alone does not guarantee success.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

Oct-7-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report > New Finding (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (0.87)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)