Limits of Transformer Language Models on Algorithmic Learning

Thomm, Jonathan, Terzic, Aleksandar, Karunaratne, Geethan, Camposampiero, Giacomo, Schölkopf, Bernhard, Rahimi, Abbas

Feb-8-2024–arXiv.org Artificial Intelligence

We analyze the capabilities of Transformer language models on learning discrete algorithms. To this end, we introduce two new tasks demanding the composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini we measure learning compositions of learned primitives. We observe that the compositional capabilities of state-of-the-art Transformer language models are very limited and sample-wise scale worse than relearning all sub-tasks for a new algorithmic composition. We also present a theorem in complexity theory, showing that gradient descent on memorizing feedforward models can be exponentially data inefficient.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-8-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- North America > United States
  - New York (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language (1.00)