Robust Reward Alignment via Hypothesis Space Batch Cutting