Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities
Álvarez, Verónica, Mazuelas, Santiago, An, Steven, Dasgupta, Sanjoy
Abstract--The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programma tic weak supervision obtains probabilistic predictions for th e labels by leveraging multiple weak labeling functions (LFs) that p ro-vide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. This paper presents a methodology for programma tic weak supervision that can provide confidence intervals for l abel probabilities and obtain more reliable predictions. In par ticular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestric ted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented. OR many machine learning applications, the accurate labeling of datasets is both costly and time-consuming [1]-[4]. Given an unlabeled dataset, methods for programmatic weak supervision aim to leverage multiple wea k labeling functions (LFs) to provide accurate labels [5], [6 ]. Since common LFs only provide rough guesses for labels, programmatic weak supervision methods use the outputs of multiple LFs to obtain probabilistic predictions for the la bel of each instance [7]-[13]. These predictions can then be use d to create a fully supervised dataset composed by the instanc es corresponding to high-confidence predictions, e.g., a labe l with a large enough predicted probability is regarded as the actu al Manuscript received September 30, 2024; accepted August 4, 2025.
Aug-7-2025
- Country:
- Europe > Spain
- Basque Country (0.04)
- Castile and León > Salamanca Province
- Salamanca (0.04)
- North America > United States
- California
- Alameda County > Berkeley (0.14)
- San Diego County
- Yolo County > Davis (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.14)
- California
- South America > Chile
- Europe > Spain
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (0.92)
- Technology: