Review for NeurIPS paper: OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling

Neural Information Processing Systems 

Additional Feedback: Some minor suggestions and typos: -Line 21, missing an "and" -Line 33, "while other developed" - "while other authors developed" -Line 50, and elsewhere in the paper, it is stated that LDA/PLSI use a squared Euclidean loss/distance. This is untrue - both models use likelihood based inference with a multinomial model, and/or Bayesian inference. The older LSI model uses a squared loss, but even the PLSI paper argued that this is insufficient (the implicit Gaussian assumption from squared errors does not hold with small counts as in text data), which motivates the probabilistic modeling approach in PLSI and LDA. The other papers by Mikolov by 2013 are more fundamental references which are better here, especially: Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality.