Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

Open in new window