Query-Based Adversarial Prompt Generation

Neural Information Processing Systems 

These attacks allow an adversary to cause an otherwise "aligned" model--that typically refuses requests such as "how do I build a bomb?" or "swear at me!"--to comply with such requests, or

Similar Docs  Excel Report  more

TitleSimilaritySource
None found