A Comprehensive Guide to Stochastic Gradient Descent Algorithms
Unfortunately, the reality is a little bit different, in particular in deep models, where the number of parameters is in the order of ten or one hundred million. When the system is relatively shallow, it's easier to find local minima where the training process can stop, while in deeper models, the probability of a local minimum becomes smaller and, instead, saddle points become more and more likely.
Nov-18-2019, 02:53:16 GMT
- Technology: