Preemptive Detection and Steering of LLM Misalignment via Latent Reachability

Open in new window