Robust Sampling for Active Statistical Inference

Li, Puheng, Zrnic, Tijana, Candès, Emmanuel

arXiv.org Machine Learning 

Collecting high-quality labeled data remains a challenge in data-driven research, especially when each label is costly and time-consuming to obtain. In response, many fields have embraced machine learning as a practical solution for predicting unobserved labels, such as annotating satellite imagery in remote sensing [46] and predicting protein structures in proteomics [24]. Prediction-powered inference [1] is a methodological framework showing how to perform valid statistical inference despite the inherent biases in such predicted labels. Active statistical inference [51] was recently introduced to further enhance inference by actively selecting which data points to label. The basic idea is to compute the model's uncertainty scores for all data points and prioritize collecting those labels for which the predictive model is most uncertain. When the uncertainty scores appropriately reflect the model's errors, Zrnic and Cand` es [51] show that active inference can significantly outperform prediction-powered inference (which can essentially be thought of as active inference with naive uniform sampling), meaning it results in more accurate estimates and narrower confidence intervals. However, when uncertainty scores are of poor quality, active inference can result in overly noisy estimates and large confidence intervals.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found