Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector
Yang, Haoyan, Bao, Runxue, Xiao, Cao, Ma, Jun, Bhatia, Parminder, Gao, Shangqian, Kass-Hout, Taha
–arXiv.org Artificial Intelligence
LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mitigate these biases face key limitations: in-context learning-based methods fail to address rooted biases due to the evaluator's limited capacity for self-reflection, whereas fine-tuning is not applicable to all evaluator types, especially closed-source models. To address this challenge, we introduce the Reasoning-based Bias Detector (RBD), which is a plug-in module that identifies biased evaluations and generates structured reasoning to guide evaluator self-correction. Rather than modifying the evaluator itself, RBD operates externally and engages in an iterative process of bias detection and feedback-driven revision. To support its development, we design a complete pipeline consisting of biased dataset construction, supervision collection, distilled reasoning-based fine-tuning of RBD, and integration with LLM evaluators. We fine-tune four sizes of RBD models, ranging from 1.5B to 14B, and observe consistent performance improvements across all scales. Experimental results on 4 bias types--verbosity, position, bandwagon, and sentiment--evaluated using 8 LLM evaluators demonstrate RBD's strong effectiveness. For example, the RBD-8B model improves evaluation accuracy by an average of 18.5% and consistency by 10.9%, and surpasses prompting-based baselines and fine-tuned judges by 12.8% and 17.2%, respectively. These results highlight RBD's effectiveness and scalability. Additional experiments further demonstrate its strong generalization across biases and domains, as well as its efficiency.
arXiv.org Artificial Intelligence
Oct-29-2025
- Country:
- Asia
- China (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Europe
- North America > United States
- Tennessee > Sullivan County > Kingsport (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Health & Medicine (1.00)
- Technology: