An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks

Roulet, Vincent, Harchaoui, Zaid

arXiv.org Machine Learning 

Deep networks have achieved remarkable performance in several application domains such as computer vision, natural language processing and genomics (Krizhevsky et al. 2012, Pennington et al. 2014, Duvenaud et al. 2015). A deep network can be framed as a chain of composition of modules, where each module is typically the composition of a nonlinear function and an affine transformation. The last module in the chain is usually task-specific and can be expressed either in analytical form as in supervised classification or as the solution of an optimization problem in dimension reduction or clustering. The optimization problem arising when training a deep network is often framed as a non-convex optimization problem, dismissing the structure of the objective yet central to the software implementation. Indeed optimization algorithms used to train deep networks proceed by making calls to first-order (or second-order) oracles relying on dynamic programming such as gradient back-propagation (Werbos 1994, Rumelhart et al. 1986, Lecun 1988).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found