Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models

Open in new window