Understanding Transformer Reasoning Capabilities via Graph Algorithms

May-27-2025, 08:37:22 GMT–Neural Information Processing Systems

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark.

artificial intelligence, machine learning, transformer reasoning capability, (3 more...)

Neural Information Processing Systems

May-27-2025, 08:37:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (0.71)
  - Cognitive Science > Problem Solving (0.66)