Efficient Adversarial Training in LLMs with Continuous Attacks

Neural Information Processing Systems 

The authors use Greedy Coordinate Gradient (GCG) to generate discrete adversarial suffixes in natural language.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found