When can transformers reason with abstract symbols?

Boix-Adsera, Enric, Saremi, Omid, Abbe, Emmanuel, Bengio, Samy, Littwin, Etai, Susskind, Joshua

Oct-15-2023–arXiv.org Artificial Intelligence

We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-15-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.45)

Genre:
- Research Report (0.49)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)
  - Natural Language > Large Language Model (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found