Early Stopping is Nonparametric Variational Inference

Maclaurin, Dougal, Duvenaud, David, Adams, Ryan P.

Apr-6-2015–arXiv.org Machine Learning

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

entropy, neural network, optimization problem, (19 more...)

arXiv.org Machine Learning

Apr-6-2015

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)
    - Neural Networks (1.00)
    - Statistical Learning > Gradient Descent (0.72)
  - Representation & Reasoning
    - Optimization (1.00)
    - Uncertainty > Bayesian Inference (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found