Reviews: Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

Neural Information Processing Systems 

In this paper the authors advocate the use of the Sinkhorn distance over the "regularized" sinkhorn distance for computing divergence between discrete distributions. They show that the gradient of the former is better and leads to sharper results especially on barycenters. They also provide a close form expression for the gradient of the Sinkhorn distance using the implicit function theorem. Another contribution is a new generalization bound for structured prediction with Wasserstein distance. Numerical experiments are very short be show a better barycenter with the Sinkhorn distance and better reconstruction for images.