A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift

Open in new window