Goto

Collaborating Authors

 Yang, Forest


Fairness with Overlapping Groups

arXiv.org Machine Learning

Machine learning inform an increasingly large number of critical decisions in diverse settings. They assist medical diagnosis (McKinney et al., 2020), guide policing (Meijer and Wessels, 2019), and power credit scoring systems (Tsai and Wu, 2008). While they have demonstrated their value in many sectors, they are prone to unwanted biases, leading to discrimination against protected subgroups within the population. For example, recent studies have revealed biases in predictive policing and criminal sentencing systems (Meijer and Wessels, 2019; Chouldechova, 2017). The blossoming body of research in algorithmic fairness aims to study and address this issue by introducing novel algorithms guaranteeing a certain level of non-discrimination in the predictions.


On the Consistency of Top-k Surrogate Losses

arXiv.org Machine Learning

The top-$k$ error is often employed to evaluate performance for challenging classification tasks in computer vision as it is designed to compensate for ambiguity in ground truth labels. This practical success motivates our theoretical analysis of consistent top-$k$ classification. To this end, we define top-$k$ calibration as a necessary and sufficient condition for consistency, for bounded below loss functions. Unlike prior work, our analysis of top-$k$ calibration handles non-uniqueness of the predictor scores, and extends calibration to consistency -- providing a theoretically sound basis for analysis of this topic. Based on the top-$k$ calibration analysis, we propose a rich class of top-$k$ calibrated Bregman divergence surrogates. Our analysis continues by showing previously proposed hinge-like top-$k$ surrogate losses are not top-$k$ calibrated and thus inconsistent. On the other hand, we propose two new hinge-like losses, one which is similarly inconsistent, and one which is consistent. Our empirical results highlight theoretical claims, confirming our analysis of the consistency of these losses.


Kernel-based Outlier Detection using the Inverse Christoffel Function

arXiv.org Machine Learning

Outlier detection methods have become increasingly relevant in recent years due to increased security concerns and because of its vast application to different fields. Recently, Pauwels and Lasserre (2016) noticed that the sublevel sets of the inverse Christoffel function accurately depict the shape of a cloud of data using a sum-of-squares polynomial and can be used to perform outlier detection. In this work, we propose a kernelized variant of the inverse Christoffel function that makes it computationally tractable for data sets with a large number of features. We compare our approach to current methods on 15 different data sets and achieve the best average area under the precision recall curve (AUPRC) score, the best average rank and the lowest root mean square deviation.