Goto

Collaborating Authors

 effective theory



Towards

Neural Information Processing Systems

The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drivediscoveryofmore efficient solutions.


Towards Understanding Grokking: An Effective Theory of Representation Learning

Neural Information Processing Systems

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations, whose training dynamics and dependence on training set size can be predicted by our effective theory (in a toy setting). We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a Goldilocks zone (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of intelligence from starvation in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.




Large Language Models and Emergence: A Complex Systems Perspective

Krakauer, David C., Krakauer, John W., Mitchell, Melanie

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are deep neural networks that, through training on huge amounts of text, learn to accurately predict the next word (or token) in a text. It has been surprising to many that next-token prediction has lead to impressive abilities, such as learning of syntax, code generation, writing in any style, and factual recall. It has been claimed in the LLM literature that, as the number of network parameters and amount of training data is scaled up, certain capabilities arise suddenly and unexpectedly, a phenomenon that these writers term "emergence". For example, Wei et al. [1] write, "we define emergent abilities of large language models as abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models." And in a recent review of emergent abilities in LLMs Berti et al. [2] survey around 100 papers the majority of which equate emergence with the discontinuous appearance of abilities with increasing data or model size.


Towards Understanding Grokking: An Effective Theory of Representation Learning

Neural Information Processing Systems

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations, whose training dynamics and dependence on training set size can be predicted by our effective theory (in a toy setting). We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion.


Neural Network Learning and Quantum Gravity

Lanza, Stefano

arXiv.org Artificial Intelligence

The landscape of low-energy effective field theories stemming from string theory is too vast for a systematic exploration. However, the meadows of the string landscape may be fertile ground for the application of machine learning techniques. Employing neural network learning may allow for inferring novel, undiscovered properties that consistent theories in the landscape should possess, or checking conjectural statements about alleged characteristics thereof. The aim of this work is to describe to what extent the string landscape can be explored with neural network-based learning. Our analysis is motivated by recent studies that show that the string landscape is characterized by finiteness properties, emerging from its underlying tame, o-minimal structures. Indeed, employing these results, we illustrate that any low-energy effective theory of string theory is endowed with certain statistical learnability properties. Consequently, several learning problems therein formulated, including interpolations and multi-class classification problems, can be concretely addressed with machine learning, delivering results with sufficiently high accuracy.


When Representations Align: Universality in Representation Learning Dynamics

van Rossem, Loek, Saxe, Andrew M.

arXiv.org Artificial Intelligence

Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.


GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

Baek, David D., Liu, Ziming, Tegmark, Max

arXiv.org Artificial Intelligence

We present GenEFT: an effective theory framework for shedding light on the statics and dynamics of neural network generalization, and illustrate it with graph learning examples. We first investigate the generalization phase transition as data size increases, comparing experimental results with information-theory-based approximations. We find generalization in a Goldilocks zone where the decoder is neither too weak nor too powerful. We then introduce an effective theory for the dynamics of representation learning, where latent-space representations are modeled as interacting particles ("repons"), and find that it explains our experimentally observed phase transition between generalization and overfitting as encoder and decoder learning rates are scanned. This highlights the power of physics-inspired effective theories for bridging the gap between theoretical predictions and practice in machine learning.