The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Zhong, Ziqian, Liu, Ziming, Tegmark, Max, Andreas, Jacob
–arXiv.org Artificial Intelligence
Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm (previously described by Nanda et al. [1]); others implement a previously undescribed, less intuitive, but comprehensible procedure we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.
arXiv.org Artificial Intelligence
Nov-21-2023
- Country:
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- United Kingdom > England
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Education (0.48)
- Technology: