AITopics | Lau, Edmund

Collaborating Authors

Lau, Edmund

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Estimating the Local Learning Coefficient at Scale

Furman, Zach, Lau, Edmund

arXiv.org Artificial IntelligenceFeb-5-2024

The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {\tt arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.

artificial intelligence, coefficient, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.03698

Country: Europe > Portugal (0.14)

Genre: Research Report (0.82)

Industry: Education (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

Add feedback

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Chen, Zhongtian, Lau, Edmund, Mendel, Jake, Wei, Susan, Murfet, Daniel

arXiv.org Artificial IntelligenceOct-10-2023

The apparent simplicity of the Toy Model of Superposition (TMS) proposed in Elhage et al. (2022) conceals a remarkably intricate phase structure. During training, a plateau in the loss is often followed by a sudden discrete drop, suggesting some development in the network's internal structure. To shed light on these transitions and their significance, this paper examines the dynamical transitions in TMS during SGD training, connecting them to phase transitions of the Bayesian posterior with respect to sample size n. While the former transitions have been observed in several recent works in deep learning (Olsson et al., 2022; McGrath et al., 2022; Wei et al., 2022a), their formal status has remained elusive. In contrast, phase transitions of the Bayesian posterior are mathematically well-defined in Singular Learning Theory (SLT) (Watanabe, 2009). Using SLT, we can show formally that the Bayesian posterior is subject to an internal model selection mechanism in the following sense: the posterior prefers, for small training sample size n, critical points with low complexity but potentially high loss. The opposite is true for high n where the posterior prefers low loss critical points at the cost of higher complexity. The measure of complexity here is very specific: it is the local learning coefficient, λ, of the critical points, first alluded to by Watanabe (2009, 7.6) and clarified recently in Lau et al. (2023). We can think of this internal model selection as a discrete dynamical process: at various critical sample sizes the posterior concentration "jumps" from one region W

artificial intelligence, dynamical versus bayesian phase transition, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2310.06301

Country: North America > United States (0.14)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Quantifying degeneracy in singular models via the learning coefficient

Lau, Edmund, Murfet, Daniel, Wei, Susan

arXiv.org Artificial IntelligenceAug-23-2023

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.12108

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Variational Bayesian Neural Networks via Resolution of Singularities

Wei, Susan, Lau, Edmund

arXiv.org Artificial IntelligenceFeb-12-2023

In this work, we advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs). To begin, using SLT, we lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective. Next, we use the SLT-corrected asymptotic form for singular posterior distributions to inform the design of the variational family itself. Specifically, we build upon the idealized variational family introduced in \citet{bhattacharya_evidence_2020} which is theoretically appealing but practically intractable. Our proposal takes shape as a normalizing flow where the base distribution is a carefully-initialized generalized gamma. We conduct experiments comparing this to the canonical Gaussian base distribution and show improvements in terms of variational free energy and variational generalization error.

artificial intelligence, base distribution, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.06035

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback