AITopics | Storkey, Amos J.

Moonshine: Distilling with Cheap Convolutions

Crowley, Elliot J., Gray, Gavin, Storkey, Amos J.

Neural Information Processing SystemsDec-31-2018

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

artificial intelligence, convolution, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Communications > Networks (0.88)

Add feedback

Moonshine: Distilling with Cheap Convolutions

Crowley, Elliot J., Gray, Gavin, Storkey, Amos J.

Neural Information Processing SystemsDec-31-2018

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

convolution, deep learning, neural network, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Communications > Networks (0.88)

Add feedback

GINN: Geometric Illustration of Neural Networks

Darlow, Luke N., Storkey, Amos J.

arXiv.org Machine LearningOct-2-2018

This informal technical report details the geometric illustration of decision boundaries for ReLU units in a three layer fully connected neural network. The network is designed and trained to predict pixel intensity from an (x, y) input location. The Geometric Illustration of Neural Networks (GINN) tool was built to visualise and track the points at which ReLU units switch from being active to off (or vice versa) as the network undergoes training. Several phenomenon were observed and are discussed herein. This technical report is a supporting document to the blog post with online demos and is available at http://www.bayeswatch.com/2018/09/17/GINN/.

artificial intelligence, boundary, neural network, (16 more...)

arXiv.org Machine Learning

1810.0186

Country: Europe > United Kingdom (0.15)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CINIC-10 is not ImageNet or CIFAR-10

Darlow, Luke N., Crowley, Elliot J., Antoniou, Antreas, Storkey, Amos J.

arXiv.org Machine LearningOct-2-2018

In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository.

dataset, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1810.03505

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.83)

Industry: Transportation (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Asymptotically exact inference in differentiable generative models

Graham, Matthew M., Storkey, Amos J.

arXiv.org Machine LearningMar-2-2017

Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to coherently move between inputs exactly consistent with observations. We validate the method by performing inference tasks in a diverse set of models.

bayesian inference, inference, neural network, (19 more...)

arXiv.org Machine Learning

1605.07826

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems

Zhu, Zhanxing (University of Edinburgh) | Storkey, Amos J. (University of Edinburgh)

AAAI ConferencesApr-19-2016

We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.

algorithm, artificial intelligence, machine learning, (15 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Shang, Xiaocheng, Zhu, Zhanxing, Leimkuhler, Benedict, Storkey, Amos J.

Neural Information Processing SystemsDec-31-2015

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

artificial intelligence, ccadl, survey article, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems

Zhu, Zhanxing, Storkey, Amos J.

arXiv.org Machine LearningNov-23-2015

We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.

artificial intelligence, ccsp problem, machine learning, (14 more...)

arXiv.org Machine Learning

1511.07294

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Shang, Xiaocheng, Zhu, Zhanxing, Leimkuhler, Benedict, Storkey, Amos J.

arXiv.org Machine LearningOct-29-2015

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

artificial intelligence, ccadl, survey article, (15 more...)

arXiv.org Machine Learning

1510.08692

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Zhu, Zhanxing, Storkey, Amos J.

arXiv.org Machine LearningJun-12-2015

We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We theoretically show that our proposal of adaptive stepsize potentially achieves a sharper linear convergence rate compared with the existing methods. Additionally, since we can select "mini-batch" of block coordinates to update, our method is also amenable to parallel processing for large-scale data. We apply the proposed method to regularized empirical risk minimization and show that it performs comparably or, more often, better than state-of-the-art methods on both synthetic and real-world data sets.

adaspdc, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1506.04093

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Filters

Collaborating Authors

Storkey, Amos J.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Moonshine: Distilling with Cheap Convolutions

Moonshine: Distilling with Cheap Convolutions

GINN: Geometric Illustration of Neural Networks

CINIC-10 is not ImageNet or CIFAR-10

Asymptotically exact inference in differentiable generative models

Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems