AITopics | Leimkuhler, Benedict

Collaborating Authors

Leimkuhler, Benedict

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics

Paulin, Daniel, Whalley, Peter A., Chada, Neil K., Leimkuhler, Benedict

arXiv.org Machine LearningOct-14-2024

We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. Our scheme combines a symmetric forward/backward sweep over minibatches with a symmetric discretization of Langevin dynamics. For a particular Langevin splitting method (UBU), we show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias $O(h^2 d^{1/2})$ in dimension $d>0$ with stepsize $h>0$, despite only using one minibatch per iteration, thus providing excellent control of the sampling bias as a function of the stepsize. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures for classification problems on three different datasets (Fashion-MNIST, Celeb-A and chest X-ray). Our results indicate that BNNs sampled with SMS-UBU can offer significantly better calibration performance compared to standard methods of training and stochastic weight averaging.

artificial intelligence, machine learning, momentum, (21 more...)

arXiv.org Machine Learning

2410.1978

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients

Chada, Neil K., Leimkuhler, Benedict, Paulin, Daniel, Whalley, Peter A.

arXiv.org Machine LearningNov-8-2023

We present an unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. It can achieve accuracy $\epsilon>0$ for estimating expectations of Lipschitz functions in $d$ dimensions with $\mathcal{O}(d^{1/4}\epsilon^{-2})$ expected gradient evaluations, without assuming warm start. We exhibit similar bounds using both approximate and stochastic gradients, and our method's computational cost is shown to scale logarithmically with the size of the dataset. The proposed method is tested using a multinomial regression problem on the MNIST dataset and a Poisson regression model for soccer scores. Experiments indicate that the number of gradient evaluations per effective sample is independent of dimension, even when using inexact gradients. For product distributions, we give dimension-independent variance bounds. Our results demonstrate that the unbiased algorithm we present can be much more efficient than the ``gold-standard" randomized Hamiltonian Monte Carlo.

artificial intelligence, gradient, machine learning, (14 more...)

arXiv.org Machine Learning

2311.05025

Country: Europe > United Kingdom (0.27)

Genre: Research Report > New Finding (0.85)

Industry: Leisure & Entertainment > Sports > Soccer (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Multirate Training of Neural Networks

Vlaar, Tiffany, Leimkuhler, Benedict

arXiv.org Machine LearningJun-20-2021

We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained simultaneously using different learning rates. By choosing appropriate partitionings we can obtain large computational speed-ups for transfer learning tasks. We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting model. We also discuss other splitting choices for the neural network parameters which are beneficial in enhancing generalization performance in settings where neural networks are trained from scratch. Finally, we propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously. The benefits of using this approach are illustrated for ResNet architectures on image data. Our paper unlocks the potential of using multirate techniques for neural network training and provides many starting points for future work in this area.

artificial intelligence, machine learning, neural network, (5 more...)

arXiv.org Machine Learning

2106.10771

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Better Training using Weight-Constrained Stochastic Dynamics

Leimkuhler, Benedict, Vlaar, Tiffany, Pouchon, Timothée, Storkey, Amos

arXiv.org Machine LearningJun-20-2021

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.

constraint, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2106.10704

Country: North America > United States (0.45)

Genre: Research Report (1.00)

Industry: Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Constraint-Based Regularization of Neural Networks

Leimkuhler, Benedict, Pouchon, Timothée, Vlaar, Tiffany, Storkey, Amos

arXiv.org Machine LearningJun-17-2020

We propose a method for efficiently incorporating constraints into a stochastic gradient Langevin framework for the training of deep neural networks. Constraints allow direct control of the parameter space of the model. Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks and thus improve the robustness of training algorithms and the generalization capabilities of the trained neural network. We present examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. We describe the methods in the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta help to improve sampling efficiency. The methods are explored in test examples in image classification and natural language processing.

constraint, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2006.10114

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Partitioned integrators for thermodynamic parameterization of neural networks

Leimkuhler, Benedict, Matthews, Charles, Vlaar, Tiffany

arXiv.org Machine LearningAug-30-2019

Stochastic Gradient Langevin Dynamics, the "unadjusted Langevin algorithm", and Adaptive Langevin Dynamics (also known as Stochastic Gradient Nos\'{e}-Hoover dynamics) are examples of existing thermodynamic parameterization methods in use for machine learning, but these can be substantially improved. We find that by partitioning the parameters based on natural layer structure we obtain schemes with rapid convergence for data sets with complicated loss landscapes. We describe easy-to-implement hybrid partitioned numerical algorithms, based on discretized stochastic differential equations, which are adapted to feed-forward neural networks, including LaLa (a multi-layer Langevin algorithm), AdLaLa (combining the adaptive Langevin and Langevin algorithms) and LOL (combining Langevin and Overdamped Langevin); we examine the convergence of these methods using numerical studies and compare their performance among themselves and in relation to standard alternatives such as stochastic gradient descent and ADAM. We present evidence that thermodynamic parameterization methods can be (i) faster, (ii) more accurate, and (iii) more robust than standard algorithms incorporated into machine learning frameworks, in particular for data sets with complicated loss landscapes. Moreover, we show in numerical studies that methods based on sampling excite many degrees of freedom. The equipartition property, which is a consequence of their ergodicity, means that these methods keep in play an ensemble of low-loss states during the training process. By drawing parameter states from a sufficiently rich distribution of nearby candidate states, we show that the thermodynamic schemes produce smoother classifiers, improve generalization and reduce overfitting compared to traditional optimizers.

adlala, educational setting, neural network, (18 more...)

arXiv.org Machine Learning

1908.11843

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications

Heber, Frederik, Trstanova, Zofia, Leimkuhler, Benedict

arXiv.org Machine LearningMar-20-2019

The fundamental role of neural networks (NNs) is readily apparent from their widespread use in machine learning in applications such as natural language processing [72], social network analysis [26], medical diagnosis [6, 35], vision systems [66], and robotic path planning [44]. The greatest success of these models lies in their flexibility, their ability to represent complex, nonlinear relationships in high-dimensional data sets, and the availability of frameworks that allow NNs to be implemented on rapidly evolving GPU platforms [40, 29]. The industrial appetite for deep learning has led to very rapid expansion of the subject in recent years, although, as pointed out by Dunson [19], at times the mathematical and theoretical understanding of these methods has been swept aside in the rush to advance the methodology. The potential impact on society of machine learning algorithms demands that their exposition and use be subject to the highest standards of clarity, ease of interpretation, and uncertainty quantification. Typical NN training seeks to optimize the parameters of the network (biases and weights) under the constraint that the training data set is well approximated [28, 23].

dataset, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

1903.0864

Country:

North America > United States (0.27)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.87)

Add feedback

Ensemble preconditioning for Markov chain Monte Carlo simulation

Matthews, Charles, Weare, Jonathan, Leimkuhler, Benedict

arXiv.org Machine LearningJul-13-2016

We describe parallel Markov chain Monte Carlo methods that propagate a collective ensemble of paths, with local covariance information calculated from neighboring replicas. The use of collective dynamics eliminates multiplicative noise and stabilizes the dynamics thus providing a practical approach to difficult anisotropic sampling problems in high dimensions. Numerical experiments with model problems demonstrate that dramatic potential speedups, compared to various alternative schemes, are attainable.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1607.03954

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Shang, Xiaocheng, Zhu, Zhanxing, Leimkuhler, Benedict, Storkey, Amos J.

Neural Information Processing SystemsDec-31-2015

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

artificial intelligence, ccadl, survey article, (16 more...)

Neural Information Processing Systems

Technology: