imagenet-16h
Appendix Conditional Independence Dependence in 10H and
We investigate the degree to which our conditional independence assumption is satisfied empirically in the datasets used in the paper. Specifically, of interest is the assumption of conditional independence of m(x) and h(x), given y. Assessing conditional independence is not straightforward given that m(x) is a K-dimensional real-valued vector and h(x) and yeach take one of K categorical values, with K = 10 for CIFAR-10H and K = 16 for ImageNet-16H. While there exist statistical tests for assessing conditional independence for categorical random variables, with real-valued variables the situation is less straightforward and there are multiple options such as different non-parametric tests involving different tradeoffs [Runge, 2018, Marx and Vreeken, 2019, Mukherjee et al., 2020, Berrett et al., 2020]. Given these issues we investigate the degree of conditional dependence using two relatively simple approaches. The first approach looks at the conditional mutual information (CMI) between the predicted label from the model and the predicted label from the human, conditioned on the true label.
Bayesian Online Learning for Consensus Prediction
Showalter, Sam, Boyd, Alex, Smyth, Padhraic, Steyvers, Mark
Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets.
Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration
Kerrigan, Gavin, Smyth, Padhraic, Steyvers, Mark
An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.