A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift