Distribution Learning with Valid Outputs Beyond the Worst-Case

May-28-2025, 20:54:26 GMT–Neural Information Processing Systems

Generative models at times produce "invalid" outputs, such as images with generation artifacts and unnatural sounds. Validity-constrained distribution learning attempts to address this problem by requiring that the learned distribution have a provably small fraction of its mass in invalid parts of space - something which standard loss minimization does not always ensure. To this end, a learner in this model can guide the learning via "validity queries", which allow it to ascertain the validity of individual examples. Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which - while generating guarantees in a wide-range of settings - makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where guaranteeing validity is easier than in the worst-case. We show that when the data distribution lies in the model class and the log-loss is minimized, the number of samples required to ensure validity has a weak dependence on the validity requirement. Additionally, we show that when the validity region belongs to a VC-class, a limited number of validity queries are often sufficient.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

May-28-2025, 20:54:26 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.67)
  - Natural Language (0.67)
  - Representation & Reasoning (1.00)