Review for NeurIPS paper: Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Neural Information Processing Systems 

Weaknesses: - My main concern is that, I don't see the benefits of modeling the data as a union of subspaces, where each subspace corresponds to a class, when the representation space is *learned*. In particular, since these subspaces won't be orthogonal in practice, on real data. In an unsupervised setting, to recover the subspaces, one needs to perform subspace clustering, which is a hard problem and computationally expensive to perform. In stark contrast, a linear head trained with a cross-entropy loss learns a representation space with approximately linearly separable regions for each class. As a consequence, classification is simple (linear) and Lp distances in representation space are meaningful (which is not necessarily the case when the classes lie on a union of subspaces). However, there are many other methods which can make neural networks with linear classification head more robust, for example [c].