I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models

Open in new window