A Brief (and Comprehensive) Guide to Stochastic Gradient Descent Algorithms - Giuseppe Bonaccorso

#artificialintelligence 

Stochastic Gradient Descent (SGD) is a very powerful technique, currently employed to optimize all deep learning models. However, the vanilla algorithm has many limitations, in particular when the system is ill-conditioned and could never find the global minimum. In this post, we're going to analyze how it works and the most important variations that can speed up the convergence in deep models. First of all, it's necessary to standardize the naming. In some books, the expression "Stochastic Gradient Descent" refers to an algorithm which operates on a batch size equal to 1, while "Mini-batch Gradient Descent" is adopted when the batch size is greater than 1.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found