Improving Alignment and Robustness with Circuit Breakers

Neural Information Processing Systems 

AI systems can take harmful actions and are highly vulnerable to adversarial attacks.