Reviews: Root Mean Square Layer Normalization

Neural Information Processing Systems 

The authors present a new form of normalization for deep networks called RMSNorm. Because the method only requires a single pass of statistics calculations, the authors demonstrate improved training times for both machine translation and image caption retrieval while maintaining predictive accuracy. As commented by the reviewers, the paper is clearly written; the results are clearly presented and the experiments are quite thorough (different ML systems; ML architectures). In sum, the results and convincing (1 reviewer upgrade their score accordingly) and the results are use-able by those that build language models and potentially other forms of deep networks that require normalization schemes. For these reasons, assuming the authors revise the paper to address all reviewer comments, this paper is accepted to this conference.