A fast point solver for deep nonlinear function approximators
Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DKPs that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators. NNs have recently shown excellent performance on a wide range of previously intractable tasks (e.g. While neural network training is now commonplace, stepping back we can see two problems.
Aug-30-2021
- Country:
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Cambridgeshire > Cambridge (0.04)
- Bristol (0.04)
- Germany > Bremen
- Bremen (0.04)
- United Kingdom > England
- Europe
- Genre:
- Research Report (0.40)