BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
–Neural Information Processing Systems
Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger".
Neural Information Processing Systems
Oct-9-2025, 18:01:14 GMT
- Country:
- Asia > China
- Hong Kong (0.04)
- North America > United States
- California > Yolo County
- Davis (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- California > Yolo County
- Asia > China
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Banking & Finance (1.00)
- Government > Regional Government
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: