Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector

Jun-10-2026, 01:37:01 GMT–Neural Information Processing Systems

LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mitigate these biases face key limitations: in-context learning-based methods fail to address rooted biases due to the evaluator's limited capacity for self-reflection, whereas fine-tuning is not applicable to all evaluator types, especially closed-source models.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Jun-10-2026, 01:37:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.44)