momentumrnn
- North America > United States (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
MomentumRNN: Integrating Momentum into Recurrent Neural Networks
Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (4 more...)
framework: 1) makes a very novel and thought-provoking [ R3 ] connection between RNN and optimization [ R3
Below we address the concerns raised by the reviewers. We believe there is a misunderstanding. LSTM, the cell state performs additive integration of the input so that the gradients do not vanish. " We hope reviewers can reevaluate this crucial point. Rather, we aim to bring in new ideas from optimization to design better RNNs.
Review for NeurIPS paper: MomentumRNN: Integrating Momentum into Recurrent Neural Networks
This was *not* demonstrated -- empirically faster convergence is not equatable to acceleration in an optimization sense. It is better to use precise language to separate what you have shown (an architecture with faster convergence) and what you are hypothesizing (it is related to momentum acceleration). As presented, I am not convinced of the latter connection. I think it's fine to say that your method is *inspired* by momentum, but in my opinion the paper implies a much stronger connection that is not substantiated by the theoretical and empirical results. I currently remain unconvinced that the proposed method's improvements are related to momentum at all. There are plenty of simpler explanations, as also offered by Reviewers 3 and 4, which should at least be discussed and ideally ablated. Ultimately, I see this paper as posing an interesting possible connection, but one that is currently speculative and not ready for publication. Aside from the overall writing, I have a few more detailed suggestions for improving the paper.
MomentumRNN: Integrating Momentum into Recurrent Neural Networks
Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks.
MomentumRNN: Integrating Momentum into Recurrent Neural Networks
Nguyen, Tan M., Baraniuk, Richard G., Bertozzi, Andrea L., Osher, Stanley J., Wang, Bao
Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at https://github.com/minhtannguyen/MomentumRNN.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (4 more...)