Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

Oct-10-2025, 19:39:53 GMT–Neural Information Processing Systems

Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer.

gradient cuff, query, vicuna-7b-v1, (16 more...)

Neural Information Processing Systems

Oct-10-2025, 19:39:53 GMT

Conferences PDF

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Austria (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia > China
  - Hong Kong > Sha Tin (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Information Technology (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)

Duplicate Docs Excel Report

Title
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

Similar Docs Excel Report more

Title	Similarity	Source
None found