Gradient-Based Language Model Red Teaming

Open in new window