Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

Neural Information Processing Systems 

Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found