Review for NeurIPS paper: MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Neural Information Processing Systems 

This was *not* demonstrated -- empirically faster convergence is not equatable to acceleration in an optimization sense. It is better to use precise language to separate what you have shown (an architecture with faster convergence) and what you are hypothesizing (it is related to momentum acceleration). As presented, I am not convinced of the latter connection. I think it's fine to say that your method is *inspired* by momentum, but in my opinion the paper implies a much stronger connection that is not substantiated by the theoretical and empirical results. I currently remain unconvinced that the proposed method's improvements are related to momentum at all. There are plenty of simpler explanations, as also offered by Reviewers 3 and 4, which should at least be discussed and ideally ablated. Ultimately, I see this paper as posing an interesting possible connection, but one that is currently speculative and not ready for publication. Aside from the overall writing, I have a few more detailed suggestions for improving the paper.