Learning diverse attacks on large language models for robust red-teaming and safety tuning

Open in new window