Ranking Data with Continuous Labels through Oriented Recursive Partitions

Clémençon, Stéphan, Achab, Mastane

Neural Information Processing Systems 

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s: X R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional cri- terion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X), Y). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximiza- tion under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree.