An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks
Roulet, Vincent, Harchaoui, Zaid
Deep networks have achieved remarkable performance in several application domains such as computer vision, natural language processing and genomics (Krizhevsky et al. 2012, Pennington et al. 2014, Duvenaud et al. 2015). A deep network can be framed as a chain of composition of modules, where each module is typically the composition of a nonlinear function and an affine transformation. The last module in the chain is usually task-specific and can be expressed either in analytical form as in supervised classification or as the solution of an optimization problem in dimension reduction or clustering. The optimization problem arising when training a deep network is often framed as a non-convex optimization problem, dismissing the structure of the objective yet central to the software implementation. Indeed optimization algorithms used to train deep networks proceed by making calls to first-order (or second-order) oracles relying on dynamic programming such as gradient back-propagation (Werbos 1994, Rumelhart et al. 1986, Lecun 1988).
Feb-20-2020
- Country:
- North America > United States
- Washington > King County
- Seattle (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Washington > King County
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Technology: