reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs

Open in new window