Robust Sampling for Active Statistical Inference

Li, Puheng, Zrnic, Tijana, Candès, Emmanuel

Nov-13-2025–arXiv.org Machine Learning

Collecting high-quality labeled data remains a challenge in data-driven research, especially when each label is costly and time-consuming to obtain. In response, many fields have embraced machine learning as a practical solution for predicting unobserved labels, such as annotating satellite imagery in remote sensing [46] and predicting protein structures in proteomics [24]. Prediction-powered inference [1] is a methodological framework showing how to perform valid statistical inference despite the inherent biases in such predicted labels. Active statistical inference [51] was recently introduced to further enhance inference by actively selecting which data points to label. The basic idea is to compute the model's uncertainty scores for all data points and prioritize collecting those labels for which the predictive model is most uncertain. When the uncertainty scores appropriately reflect the model's errors, Zrnic and Cand` es [51] show that active inference can significantly outperform prediction-powered inference (which can essentially be thought of as active inference with naive uniform sampling), meaning it results in more accurate estimates and narrower confidence intervals. However, when uncertainty scores are of poor quality, active inference can result in overly noisy estimates and large confidence intervals.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

Nov-13-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe > France (0.04)
- North America > United States
  - California (0.04)
- Oceania > New Zealand (0.04)

Genre:
- Research Report (1.00)

Industry:
- Energy > Renewable
  - Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.54)
- Government > Regional Government
  - North America Government > United States Government (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks (0.93)
      - Statistical Learning > Regression (0.46)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
  - Data Science (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found