Regularized Data Programming with Automated Bayesian Prior Selection

Maasch, Jacqueline R. M. A., Zhang, Hao, Yang, Qian, Wang, Fei, Kuleshov, Volodymyr

Oct-25-2023–arXiv.org Artificial Intelligence

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

artificial intelligence, machine learning, regularized data programming, (1 more...)

arXiv.org Artificial Intelligence

Oct-25-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found