Data-adaptive Safety Rules for Training Reward Models

Open in new window