CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention
–Neural Information Processing Systems
As large language models (LLMs) are increasingly deployed in real-world applications, ensuring the safety of their outputs during decoding has become a critical challenge.
Neural Information Processing Systems
Jun-12-2026, 01:22:34 GMT
- Technology: