Reviews: Benefits of over-parameterization with EM

Neural Information Processing Systems 

I would suggest elaborating on the optimization landscape more in the paper --Finally, the mixture of two gaussians is a very special case where EM converges since the landscape does not have bad local optima. The paper misses discussions on the following relevant results: (a) Jin, Chi, et al. "Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences."