Beyond Backprop: Alternating Minimization with co-Activation Memory

Choromanska, Anna, Kumaravel, Sadhana, Luss, Ronny, Rish, Irina, Kingsbury, Brian, Tejwani, Ravi, Bouneffouf, Djallel

arXiv.org Machine Learning 

We propose a novel online algorithm for training deep feedforward neural networks that employs alternating minimization (block-coordinate descent) between the weights and activation variables. It extends off-line alternating minimization approaches to online, continual learning, and improves over stochastic gradient descent (SGD) with backpropagation in several ways: it avoids the vanishing gradient issue, it allows for non-differentiable nonlinearities, and it permits parallel weight updates across the layers. Unlike SGD, our approach employs co-activation memory inspired by the online sparse coding algorithm of [Mairal et al, 2009]. Furthermore, local iterative optimization with explicit activation updates is a potentially more biologically plausible learning mechanism than backpropagation. We provide theoretical convergence analysis and promising empirical results on several datasets.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found