Stochastic Gradient Descent (SGD) with Python - PyImageSearch

Dec-23-2016, 17:35:15 GMT–#artificialintelligence

In a "purist" implementation of SGD, your mini-batch size would be set to 1. However, we often uses mini-batches that are 1. Typical values include 32, 64, 128, and 256. To start, using batches 1 helps reduce variance in the parameter update, ultimately leading to a more stable convergence. Secondly, optimized matrix operation libraries are often more efficient when the input matrix size is a power of 2. In general, the mini-batch size is not a hyperparameter that you should worry much about. You basically determine how many training examples will fit on your GPU/main memory and then use the nearest power of 2 as the batch size.

artificial intelligence, machine learning, stochastic gradient descent, (14 more...)

#artificialintelligence

Dec-23-2016, 17:35:15 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found