r/MachineLearning - [D] Research shows SGD with too large of a mini batch can lead to huge overfitting in deep learning. Why doesn't batch gradient descent have this problem?

Aug-29-2019, 10:34:14 GMT–#artificialintelligence

SGD, in its base form, is not optimized for batches. It's designed with one sample each time in mind. Batch Gradient Descent is basically Stochastic Gradient Descent but optimized for batches, with the right kind of weighing and normalisation. In most DL frameworks there are two versions of GD - Stochastic and Batch, under the same name (SGD), and the framework chooses which one to use based on the batch size you declare.

artificial intelligence, batch gradient descent, machine learning, (4 more...)

#artificialintelligence

Aug-29-2019, 10:34:14 GMT

News Web Page

Add feedback

Industry:
- Media > News (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)