Learning ReLU Networks via Alternating Minimization

Jagatap, Gauri, Hegde, Chinmay

arXiv.org Machine Learning 

Motivation Deep neural networks have found success in a wide range of machine learning applications. However, despite significant empirical success, a rigorous algorithmic understanding of training such networks remains far less well understood. Our focus in this paper are on a class of neural networks with rectified linear units (ReLUs) as activation functions. The method of choice to train such networks is the popular (stochastic) gradient descent. ReLU networks are computationally less expensive to train when compared to networks with tanh or sigmoid activations since they generally involve simpler gradient update steps. Due to their utility as well as amenability to analysis, several recent papers have addressed the problem of provably showing that gradient descent for ReLU networks succeeds under various assumptions [1, 2, 3, 4] Our contributions In this paper, we depart from the standard approach of gradient descent for learning ReLUbased neural networks. Instead, we propose a new approach based on the technique of alternating minimization. In contrast with gradient-based learning, our algorithm is parameter-free: it does not involve any tuning parameters (such as learning rate, damping factor, dropout ratio, etc.) other than setting the number of training epochs. To the best of our knowledge, such an alternating minimization approach in the context of neural network learning is novel.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found