Multi-Agent Debate for LLMJudges with Adaptive Stability Detection

Jun-16-2026, 19:53:16 GMT–Neural Information Processing Systems

With the advancing reasoning capabilities of Large Language Models (LLMs), they are increasingly employed for complex evaluation tasks, such as grading student responses, verifying factual claims, and comparing competing answers. Leveraging multiple LLMs as automated judges can enhance robustness and accuracy by aggregating diverse perspectives, yet existing approaches often rely on static and simple aggregation methods, such as majority voting, which may produce incorrect judgments despite correct individual assessments. We propose a novel multiagent debate framework where LLMs collaboratively reason and iteratively refine judgments, formalizing this process mathematically and proving its advantages over static ensembles. To ensure computational efficiency, we introduce a stability detection mechanism using a time-varying Beta-Binomial mixture model (a mixture of two Beta-Binomial distributions) that tracks judge consensus dynamics and applies adaptive stopping via Kolmogorov-Smirnov testing. Experiments across diverse benchmarks and models demonstrate significant improvements in judgment accuracy over majority voting while maintaining computational efficiency.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-16-2026, 19:53:16 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.68)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Education > Assessment & Standards > Student Performance (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Uncertainty > Bayesian Inference (0.93)
  - Machine Learning
    - Neural Networks > Deep Learning (0.94)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found