Adversarial Manipulation of Reasoning Models using Internal Representations

Open in new window