Here at Udacity, we are tremendously excited to announce the kick-off of the second term of our Artificial Intelligence Nanodegree program. Because we are able to provide a depth of education that is commensurate with university education; because we are bridging the gap between universities and industry by providing you with hands-on projects and partnering with the top industries in the field; and last but certainly not least, because we are able to bring this education to many more people across the globe, at a cost that makes a top-notch AI education realistic for all aspiring learners. During the first term, you've enjoyed learning about Game Playing Agents, Simulated Annealing, Constraint Satisfaction, Logic and Planning, and Probabilistic AI from some of the biggest names in the field: Sebastian Thrun, Peter Norvig, and Thad Starner. Term 2 will be focused on one of the cutting-edge advancements of AI -- Deep Learning. In this Term, you will learn about the foundations of neural networks, understand how to train these neural networks with techniques such as gradient descent and backpropagation, and learn different types of architectures that make neural networks work for a variety of different applications.
Yuan, Kun, Ying, Bicheng, Sayed, Ali H.
The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.
Desjardins, Guillaume, Simonyan, Karen, Pascanu, Razvan, kavukcuoglu, koray
We introduce Natural Neural Networks, a novel family of algorithms that speed up convergence by adapting their internal representation during training to improve conditioning of the Fisher matrix. In particular, we show a specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while preserving the feed-forward computation of the network. Such networks can be trained efficiently via the proposed Projected Natural Gradient Descent algorithm (PRONG), which amortizes the cost of these reparametrizations over many parameter updates and is closely related to the Mirror Descent online learning algorithm. We highlight the benefits of our method on both unsupervised and supervised learning tasks, and showcase its scalability by training on the large-scale ImageNet Challenge dataset.