Beyond Backprop: Alternating Minimization with co-Activation Memory
Choromanska, Anna, Kumaravel, Sadhana, Luss, Ronny, Rish, Irina, Kingsbury, Brian, Tejwani, Ravi, Bouneffouf, Djallel
We propose a novel online algorithm for training deep feedforward neural networks that employs alternating minimization (block-coordinate descent) between the weights and activation variables. It extends off-line alternating minimization approaches to online, continual learning, and improves over stochastic gradient descent (SGD) with backpropagation in several ways: it avoids the vanishing gradient issue, it allows for non-differentiable nonlinearities, and it permits parallel weight updates across the layers. Unlike SGD, our approach employs co-activation memory inspired by the online sparse coding algorithm of [Mairal et al, 2009]. Furthermore, local iterative optimization with explicit activation updates is a potentially more biologically plausible learning mechanism than backpropagation. We provide theoretical convergence analysis and promising empirical results on several datasets.
Jun-23-2018
- Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Genre:
- Research Report (0.50)
- Instructional Material (0.49)
- Industry:
- Education > Educational Setting (0.46)
- Technology: