Models That Prove Their Own Correctness

Amit, Noga, Goldwasser, Shafi, Paradise, Orr, Rothblum, Guy

Jun-7-2024–arXiv.org Artificial Intelligence

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output *and* successfully proves its correctness to $V\!$. The *soundness* property of $V$ guarantees that, for *every* input, no model can convince $V$ of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by $V$. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.

correctness, self-proving model, transcript, (17 more...)

arXiv.org Artificial Intelligence

Jun-7-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland > Brisbane (0.04)
- North America > United States
  - North Carolina (0.04)
  - Rhode Island > Providence County
    - Providence (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.14)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - Colorado > Denver County
    - Denver (0.04)
  - California
    - Los Angeles County > Long Beach (0.04)
    - San Diego County > San Diego (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Germany > Berlin (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (0.46)
  - Natural Language
    - Large Language Model (0.46)
    - Chatbot (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found