Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Chen, Zhongtian, Lau, Edmund, Mendel, Jake, Wei, Susan, Murfet, Daniel

Oct-10-2023–arXiv.org Artificial Intelligence

The apparent simplicity of the Toy Model of Superposition (TMS) proposed in Elhage et al. (2022) conceals a remarkably intricate phase structure. During training, a plateau in the loss is often followed by a sudden discrete drop, suggesting some development in the network's internal structure. To shed light on these transitions and their significance, this paper examines the dynamical transitions in TMS during SGD training, connecting them to phase transitions of the Bayesian posterior with respect to sample size n. While the former transitions have been observed in several recent works in deep learning (Olsson et al., 2022; McGrath et al., 2022; Wei et al., 2022a), their formal status has remained elusive. In contrast, phase transitions of the Bayesian posterior are mathematically well-defined in Singular Learning Theory (SLT) (Watanabe, 2009). Using SLT, we can show formally that the Bayesian posterior is subject to an internal model selection mechanism in the following sense: the posterior prefers, for small training sample size n, critical points with low complexity but potentially high loss. The opposite is true for high n where the posterior prefers low loss critical points at the cost of higher complexity. The measure of complexity here is very specific: it is the local learning coefficient, λ, of the critical points, first alluded to by Watanabe (2009, 7.6) and clarified recently in Lau et al. (2023). We can think of this internal model selection as a discrete dynamical process: at various critical sample sizes the posterior concentration "jumps" from one region W

artificial intelligence, dynamical versus bayesian phase transition, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Oct-10-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.34)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found