Reviews: Supervised Word Mover's Distance

Neural Information Processing Systems 

Overall the paper reads like a nice combination of existing tricks, and provides very convincing experimental results. Strengths of the paper are simplicity and a relatively straightforward idea, but not trivial to implement/test. The experimental section is therefore a strong part of this paper. Things to improve: handle better the interplay between regularized/not regularized formulations, be more rigorous with maths (computations/notations are a bit sloppy) and ideally provide an algorithmic box to see more clearly into what the authors propose. A few minor comments: - In Eq.1, the Euclidean distance between word embeddings is used as a cost, in Eq.6, for the purpose of Malahanobis metric learning, that cost becomes the squared euclidean metric (and thus what is usually referred to as 2-Wasserstein).