Mudd, Richard
Active learning with biased non-response to label requests
Robinson, Thomas, Tax, Niek, Mudd, Richard, Guy, Ido
Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that this sort of non-response is particularly likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for particular sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from e-commerce platform Taobao. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy to implement correction that helps mitigate model degradation.
Explaining Predictive Uncertainty with Information Theoretic Shapley Values
Watson, David S., O'Hara, Joshua, Tax, Niek, Mudd, Richard, Guy, Ido
Researchers in explainable artificial intelligence have developed numerous methods for helping users understand the predictions of complex supervised learning models. By contrast, explaining the $\textit{uncertainty}$ of model outputs has received relatively little attention. We adapt the popular Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of individual model outputs. We consider games with modified characteristic functions and find deep connections between the resulting Shapley values and fundamental quantities from information theory and conditional independence testing. We outline inference procedures for finite sample error rate control with provable guarantees, and implement efficient algorithms that perform well in a range of experiments on real and simulated data. Our method has applications to covariate shift detection, active learning, feature selection, and active feature-value acquisition.
TCE: A Test-Based Approach to Measuring Calibration Error
Matsubara, Takuo, Tax, Niek, Mudd, Richard, Guy, Ido
While a number of metrics--such as log-likelihood, userspecified This paper proposes a new metric to measure the scoring functions, and the area under the receiver calibration error of probabilistic binary classifiers, operating characteristic (ROC) curve--are used to assess the called test-based calibration error (TCE). TCE incorporates quality of probabilistic classifiers, it is usually hard or even a novel loss function based on a statistical impossible to gauge whether predictions are well-calibrated test to examine the extent to which model predictions from the values of these metrics. For assessment of calibration, differ from probabilities estimated from it is typically necessary to use a metric that measures data. It offers (i) a clear interpretation, (ii) a consistent calibration error, that is, a deviation between model predictions scale that is unaffected by class imbalance, and and probabilities of target occurrences estimated from (iii) an enhanced visual representation with repect data. The importance of assessing calibration error has been to the standard reliability diagram. In addition, we long emphasised in machine learning [Nixon et al., 2019, introduce an optimality criterion for the binning Minderer et al., 2021] and in probabilistic forecasting more procedure of calibration error metrics based on a broadly [Dawid, 1982, Degroot and Fienberg, 1983].