Differentiable Ranking and Sorting using Optimal Transport

Neural Information Processing Systems 

Sorting is used pervasively in machine learning, either to define elementary algorithms, such as k -nearest neighbors ( k -NN) rules, or to define test-time metrics, such as top- k classification accuracy or ranking losses. Sorting is however a poor match for the end-to-end, automatically differentiable pipelines of deep learning. Indeed, sorting procedures output two vectors, neither of which is differentiable: the vector of sorted values is piecewise linear, while the sorting permutation itself (or its inverse, the vector of ranks) has no differentiable properties to speak of, since it is integer-valued. We propose in this paper to replace the usual \texttt{sort} procedure with a differentiable proxy. Our proxy builds upon the fact that sorting can be seen as an optimal assignment problem, one in which the n values to be sorted are matched to an \emph{auxiliary} probability measure supported on any \emph{increasing} family of n target values.