Limits of Transformer Language Models on Algorithmic Learning

Thomm, Jonathan, Terzic, Aleksandar, Karunaratne, Geethan, Camposampiero, Giacomo, Schölkopf, Bernhard, Rahimi, Abbas

arXiv.org Artificial Intelligence 

We analyze the capabilities of Transformer language models on learning discrete algorithms. To this end, we introduce two new tasks demanding the composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini we measure learning compositions of learned primitives. We observe that the compositional capabilities of state-of-the-art Transformer language models are very limited and sample-wise scale worse than relearning all sub-tasks for a new algorithmic composition. We also present a theorem in complexity theory, showing that gradient descent on memorizing feedforward models can be exponentially data inefficient.