Review for NeurIPS paper: Regularized linear autoencoders recover the principal components, eventually

Neural Information Processing Systems 

Summary and Contributions: Post-rebuttal update: Thank you for clarifying on the motivation of this work as well as the mistake in proof. I'm updating my score since the two-stage convergence behavior is interesting, and the proposed algorithm has interesting connections to prior work. However, I'm not sure if the modified linear AE model is the best model to understand the slowness of learning NN representations, as the regularization schemes seem somewhat artificial and doesn't seem to correspond to any commonly-used algorithms. From a probabilistic perspective, it's also unclear why we would assign an arbitrary non-uniform prior when we don't have any knowledge about their scales, e.g. is there any gain from choosing a more correct prior about the scales? I think further discussion on these issues would greatly enhance this work.