Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

May-26-2025, 15:52:13 GMT–Neural Information Processing Systems

Labeling training data is a critical and expensive step in producing high accuracy ML models, whether training from scratch or fine-tuning. To make labeling more efficient, two major approaches are programmatic weak supervision (WS) and semi-supervised learning (SSL). More recent works have either explicitly or implicitly used techniques at their intersection, but in various complex and ad hoc ways. In this work, we define a simple, modular design space to study the use of SSL techniques for WS more systematically. Surprisingly, we find that fairly simple methods from our design space match the performance of more complex state-of-the-art methods, averaging a 3 p.p. increase in accuracy/F1-score across 8 standard WS benchmarks.

artificial intelligence, machine learning, semi-supervised learning, (5 more...)

Neural Information Processing Systems

May-26-2025, 15:52:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.98)
  - Unsupervised or Indirectly Supervised Learning (0.65)