Approximating Human Preferences Using a Multi-Judge Learned System

Sprejer, Eitán, Avalos, Fernando, Bernardi, Augusto, Faustino, Jose Pedro Brito de Azevedo, Haimes, Jacob, Oozeer, Narmeen Fatimah

Oct-31-2025–arXiv.org Artificial Intelligence

Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming this challenge advances key applications, such as creating reliable reward models for Reinforcement Learning from Human Feedback (RLHF) and building effective routing systems that select the best-suited model for a given user query. In this work, we propose a framework for modeling diverse, persona-based preferences by learning to aggregate outputs from multiple rubric-conditioned judges. We investigate the performance of this approach against naive baselines and assess its robustness through case studies on both human and LLM-judges biases. Our primary contributions include a persona-based method for synthesizing preference labels at scale and two distinct implementations of our aggregator: Generalized Additive Model (GAM) and a Multi-Layer Perceptron (MLP).

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Oct-31-2025

arXiv.org PDF

Add feedback

Country:
- South America (0.93)
- North America > United States
  - California (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Law (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found