Stochastic Gradient Descent (SGD) with Python - PyImageSearch
In a "purist" implementation of SGD, your mini-batch size would be set to 1. However, we often uses mini-batches that are 1. Typical values include 32, 64, 128, and 256. To start, using batches 1 helps reduce variance in the parameter update, ultimately leading to a more stable convergence. Secondly, optimized matrix operation libraries are often more efficient when the input matrix size is a power of 2. In general, the mini-batch size is not a hyperparameter that you should worry much about. You basically determine how many training examples will fit on your GPU/main memory and then use the nearest power of 2 as the batch size.
Dec-23-2016, 17:35:15 GMT
- Technology: