Review for NeurIPS paper: Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing Systems 

Additional Feedback: - Two-cluster case is a convex optimization of the linear model and has been investigated in a bit different context [21]. Therefore, the three cluster case is more untrivial and exciting. However, I am not sure that the DMFT formulation in the three-cluster case is tractable enough to analyze SGD dynamics' behavior. Since the three-cluster case is non-convex optimization, I suspect that DMFT equations (20) have some local optima. If this is the case, it becomes unclear how typical the dynamics shown in experiments on three-cluster cases are.