Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning

Open in new window