minimum conditional risk
A Tunable Loss Function for Binary Classification
Sypherd, Tyler, Diaz, Mario, Sankar, Lalitha, Kairouz, Peter
We prove that α-loss has an equivalent margin-based form and is classification-calibrated, two desirable properties for a good surrogate loss function for the ideal yet intractable 0-1 loss. For logistic regression-based classification, we provide an upper bound on the difference between the empirical and expected risk for α-loss by exploiting its Lipschitzianity along with recent results on the landscape features of empirical risk functions. Finally, we show that α-loss with α 2 performs better than log-loss on MNIST for logistic regression. I. INTRODUCTION In learning theory, the performance of a classification algorithm interms of accuracy, tractability, and convergence guarantees is contingent on the choice of a loss function. Consider a feature vector X X, an unknown finite label Y Y, and a hypothesis test h: X Y. The canonical 0-1 loss, given by 1[h(X) Y ], is considered an ideal loss function that captures the probability of incorrectly guessing the true label Y using h(X).
On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
Masnadi-shirazi, Hamed, Vasconcelos, Nuno
The machine learning problem of classifier design is studied from the perspective of probability elicitation, in statistics. This shows that the standard approach of proceeding from the specification of a loss, to the minimization of conditional risk is overly restrictive. It is shown that a better alternative is to start from the specification of a functional form for the minimum conditional risk, and derive the loss function. This has various consequences of practical interest, such as showing that 1) the widely adopted practice of relying on convex loss functions is unnecessary, and 2) many new losses can be derived for classification problems. These points are illustrated by the derivation of a new loss which is not convex, but does not compromise the computational tractability of classifier design, and is robust to the contamination of data with outliers. A new boosting algorithm, SavageBoost, is derived for the minimization of this loss. Experimental results show that it is indeed less sensitive to outliers than conventional methods, such as Ada, Real, or LogitBoost, and converges in fewer iterations.