AITopics | effective theory

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 09:43:20 GMT

Towards

The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drivediscoveryofmore efficient solutions.

artificial intelligence, machine learning, representation, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Neural Information Processing SystemsDec-25-2025, 12:27:47 GMT

Towards Understanding Grokking: An Effective Theory of Representation Learning

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations, whose training dynamics and dependence on training set size can be predicted by our effective theory (in a toy setting). We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a Goldilocks zone (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of intelligence from starvation in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

effective theory, name change, representation learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-19-2025, 12:41:15 GMT

Appendix A Definitions of the phases of learning

Similar to Eq. (12) where some validation samples can be derived from training samples, we

decoder, parallelogram, representation, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-19-2025, 12:41:11 GMT

dfc310e81992d2e4cedc09ac47eff13e-Paper-Conference.pdf

artificial intelligence, machine learning, representation, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Krakauer, David C., Krakauer, John W., Mitchell, Melanie

Large Language Models and Emergence: A Complex Systems Perspective

arXiv.org Artificial IntelligenceJun-16-2025

Large Language Models (LLMs) are deep neural networks that, through training on huge amounts of text, learn to accurately predict the next word (or token) in a text. It has been surprising to many that next-token prediction has lead to impressive abilities, such as learning of syntax, code generation, writing in any style, and factual recall. It has been claimed in the LLM literature that, as the number of network parameters and amount of training data is scaled up, certain capabilities arise suddenly and unexpectedly, a phenomenon that these writers term "emergence". For example, Wei et al. [1] write, "we define emergent abilities of large language models as abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models." And in a recent review of emergent abilities in LLMs Berti et al. [2] survey around 100 papers the majority of which equate emergence with the discontinuous appearance of abilities with increasing data or model size.

large language model, machine learning, natural language, (18 more...)

2506.11135

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsJan-19-2025, 02:50:38 GMT

Towards Understanding Grokking: An Effective Theory of Representation Learning

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations, whose training dynamics and dependence on training set size can be predicted by our effective theory (in a toy setting). We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion.

effective theory, grokking, representation learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMar-5-2024

Neural Network Learning and Quantum Gravity

Lanza, Stefano

The landscape of low-energy effective field theories stemming from string theory is too vast for a systematic exploration. However, the meadows of the string landscape may be fertile ground for the application of machine learning techniques. Employing neural network learning may allow for inferring novel, undiscovered properties that consistent theories in the landscape should possess, or checking conjectural statements about alleged characteristics thereof. The aim of this work is to describe to what extent the string landscape can be explored with neural network-based learning. Our analysis is motivated by recent studies that show that the string landscape is characterized by finiteness properties, emerging from its underlying tame, o-minimal structures. Indeed, employing these results, we illustrate that any low-energy effective theory of string theory is endowed with certain statistical learnability properties. Consequently, several learning problems therein formulated, including interpolations and multi-class classification problems, can be concretely addressed with machine learning, delivering results with sufficiently high accuracy.

algorithm, dimension, o-minimal structure, (15 more...)

2403.03245

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria > Vienna (0.04)
North America > United States > New York (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

van Rossem, Loek, Saxe, Andrew M.

When Representations Align: Universality in Representation Learning Dynamics

arXiv.org Artificial IntelligenceFeb-14-2024

Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.

architecture, neural network, representation, (14 more...)

2402.09142

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Baek, David D., Liu, Ziming, Tegmark, Max

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

arXiv.org Artificial IntelligenceFeb-8-2024

We present GenEFT: an effective theory framework for shedding light on the statics and dynamics of neural network generalization, and illustrate it with graph learning examples. We first investigate the generalization phase transition as data size increases, comparing experimental results with information-theory-based approximations. We find generalization in a Goldilocks zone where the decoder is neither too weak nor too powerful. We then introduce an effective theory for the dynamics of representation learning, where latent-space representations are modeled as interacting particles ("repons"), and find that it explains our experimentally observed phase transition between generalization and overfitting as encoder and decoder learning rates are scanned. This highlights the power of physics-inspired effective theories for bridging the gap between theoretical predictions and practice in machine learning.

generalization, model generalization, relation, (15 more...)