Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities
Álvarez, Verónica, Mazuelas, Santiago, An, Steven, Dasgupta, Sanjoy
Abstract--The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programma tic weak supervision obtains probabilistic predictions for th e labels by leveraging multiple weak labeling functions (LFs) that p ro-vide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. This paper presents a methodology for programma tic weak supervision that can provide confidence intervals for l abel probabilities and obtain more reliable predictions. In par ticular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestric ted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented. OR many machine learning applications, the accurate labeling of datasets is both costly and time-consuming [1]-[4]. Given an unlabeled dataset, methods for programmatic weak supervision aim to leverage multiple wea k labeling functions (LFs) to provide accurate labels [5], [6 ]. Since common LFs only provide rough guesses for labels, programmatic weak supervision methods use the outputs of multiple LFs to obtain probabilistic predictions for the la bel of each instance [7]-[13]. These predictions can then be use d to create a fully supervised dataset composed by the instanc es corresponding to high-confidence predictions, e.g., a labe l with a large enough predicted probability is regarded as the actu al Manuscript received September 30, 2024; accepted August 4, 2025.
Aug-7-2025
- Country:
- South America > Chile
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.14)
- California
- Alameda County > Berkeley (0.14)
- Yolo County > Davis (0.04)
- San Diego County
- Massachusetts > Middlesex County
- Europe > Spain
- Basque Country (0.04)
- Castile and León > Salamanca Province
- Salamanca (0.04)
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (0.92)
- Technology: