Detecting Prefix Bias in LLM-based Reward Models

Open in new window