The Art of Saying No: Contextual Noncompliance in Language Models
–Neural Information Processing Systems
Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of ``unsafe'' queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should comply with user requests.
Neural Information Processing Systems
Dec-26-2025, 00:57:37 GMT
- Technology: