CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention

Open in new window