CoTFormer: More Tokens With Attention Make Up For Less Depth

Mohtashami, Amirkeivan, Pagliardini, Matteo, Jaggi, Martin

Oct-16-2023–arXiv.org Artificial Intelligence

The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the Chain-of-Thought (CoT) method continue to play a pivotal role in achieving optimal downstream performance. In this work, we establish an approximate parallel between using chain-of-thought and employing a deeper transformer. Building on this insight, we introduce CoTFormer, a transformer variant that employs an implicit CoT-like mechanism to achieve capacity comparable to a deeper model. Our empirical findings demonstrate the effectiveness of CoTFormers, as they significantly outperform larger standard transformers.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-16-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Large Language Model (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found