d7a2222b8d41014e060cfeb0995501d0-Paper-Conference.pdf

Jun-22-2026, 22:12:10 GMT–Neural Information Processing Systems

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoreticallyfounded solution to this problem: to train Self-Proving models that prove the correctness of their output to a verification algorithm V via an Interactive Proof. SelfProving models satisfy that, with high probability over an input sampled from a given distribution, the model generates a correct output and successfully proves its correctness to V. The soundness property of V guarantees that, for every input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while all incorrect outputs (of any model) are detected by V. We devise and analyze two generic methods for learning Self-Proving models: Transcript Learning (TL) which relies on access to transcripts of accepting interactions, and Reinforcement Learning from Verifier Feedback (RLVF) which trains a model by emulating interactions with the verifier.

artificial intelligence, machine learning, urlhttp, (17 more...)

Neural Information Processing Systems

Jun-22-2026, 22:12:10 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.70)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found