Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
–Neural Information Processing Systems
Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer.
Neural Information Processing Systems
Oct-10-2025, 19:39:53 GMT
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia > China
- Europe
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Canada > Quebec
- Oceania > Australia
- Africa > Ethiopia
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Information Technology (0.94)
- Technology: