A fast point solver for deep nonlinear function approximators

Aug-30-2021–arXiv.org Machine Learning

Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DKPs that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators. NNs have recently shown excellent performance on a wide range of previously intractable tasks (e.g. While neural network training is now commonplace, stepping back we can see two problems.

gram matrix, kernel, objective, (13 more...)

arXiv.org Machine Learning

Aug-30-2021

arXiv.org PDF

Add feedback

Country:
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)
    - Bristol (0.04)
  - Germany > Bremen
    - Bremen (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.46)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.48)
    - Neural Networks > Deep Learning (0.46)