Predictive Inference with Weak Supervision

Cauchois, Maxime, Gupta, Suyash, Ali, Alnur, Duchi, John

Feb-9-2022–arXiv.org Machine Learning

Consider the typical supervised learning pipeline that we teach students learning statistical machine learning: we collect data in (X, Y) pairs, where Y is a label or target to be predicted; we pick a model and loss measuring the fidelity of the model to observed data; we choose the model minimizing the loss and validate it on held-out data. This picture obscures what is becoming one of the major challenges in this endeavor: that of actually collecting highquality labeled data [44, 13, 38]. Hand labeling large-scale training sets is often impractically expensive. Consider, as simple motivation, a ranking problem: a prediction is an ordered list of a set of items, yet available feedback is likely to be incomplete and partial, such as a top element (for example, in web search a user clicks on a single preferred link, or in a grocery, an individual buys one kind of milk but provides no feedback on the other brands present). Developing methods to leverage such partial and weak feedback is therefore becoming a major focus, and researchers have developed methods to transform weak and noisy labels into a dataset with strong, "gold-standard" labels [38, 56]. In this paper, we adopt this weakly labeled setting, but instead of considering model fitting and the construction of strong labels, we focus on validation, model confidence, and predictive inference, moving beyond point predictions and single labels. Our goal is to develop methods to rigorously quantify the confidence a practitioner should have in a model given only weak labels.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Machine Learning

Feb-9-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (1.00)
- Government > Voting & Elections (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.67)
  - Statistical Learning (1.00)