BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
–Neural Information Processing Systems
Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger".
Neural Information Processing Systems
Oct-9-2025, 18:01:14 GMT
- Country:
- North America > United States
- Wisconsin > Dane County
- Madison (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- California > Yolo County
- Davis (0.04)
- Wisconsin > Dane County
- Asia > China
- Hong Kong (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Government > Regional Government
- Technology: