WEAVER: Shrinking the Generation-Verification Gap with Weak Verifiers

Jun-18-2026, 10:06:12 GMT–Neural Information Processing Systems

Verifiers can improve language model (LM) capabilities by providing feedback or selecting the best response from a pool of generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean for formal proofs). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers. To help close this gap, we introduce WEAVER, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers. First we find that weighted ensembles of verifiers, which typically require learning from labeled data, significantly outperform unweighted combinations due to differences in the verifiers. To reduce the dependency on labeled data, WEAVER leverages weak supervision to estimate each verifier's accuracy and combines their outputs into a unified score that better reflects true response quality.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-18-2026, 10:06:12 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Uncertainty (0.93)
    - Optimization (0.67)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found